Coffee Space


Listen:

String Escape

Preview Image

TL;DR

Java doesn’t have any native method for escaping a String and after investigating existing methods, I decided to write my own. I posted the solution to Stack Overflow.

Problem

I was writing some JSON to a file, but the Strings themselves contained escape characters (or theoretically could), i.e.:

0001 {
0002   "key" : ""value""
0003 }

Ideally you would want to escape the speech marks (") if you wanted the structure to be parsed correctly by JSON, i.e.:

0004 {
0005   "key" : "\"value\""
0006 }

But actually this problem occurs in other places too, this isn’t even the first time I had to solve something similar. When parsing comments for example to be displayed in HTML, you don’t want users injecting <script> tags for example that may run malicious JavaScript on the machines of other users. (P.S. Any HTML parsed should be considered harmful, it’s a lot safer to filter out all HTML and add it back in later than to try and only allow select HTML through.)

Existing

Many of the solutions suggest adding the Apache Commons Text and using StringEscapeUtils. Some of the other solutions were simply wrong.

Solution

The solution I created:

0007 /**
0008  * escape()
0009  *
0010  * Escape a give String to make it safe to be printed or stored.
0011  *
0012  * @param s The input String.
0013  * @return The output String.
0014  **/
0015 public static String escape(String s){
0016   return s.replace("\\", "\\\\")
0017           .replace("\t", "\\t")
0018           .replace("\b", "\\b")
0019           .replace("\n", "\\n")
0020           .replace("\r", "\\r")
0021           .replace("\f", "\\f")
0022           .replace("\'", "\\'")
0023           .replace("\"", "\\\"");
0024 }

The escape list is from Oracle’s list. (Note that \\ is escaped first as you don’t want to re-escape it later.)

This solution isn’t as fast as it could be, but it should work. Ideally you would only parse the String once and you wouldn’t need to keep rebuilding your String array. For small Strings this should be fine. This solution was more than fine for me as the Strings I escaped were less than 64 characters, theoretically a maximum of 836 even in the worst case scenario. If escaping larger Strings, you can do much better.

If you’re thinking about this from the perspective of storing data, also consider something like converting it to Base64 representation - it’s fast, single parse and uses not too much extra space.