Coffee Space


Listen:

Code Highlighting Part 2

Preview Image

Just yesterday I wrote about a project to improve the code highlighting of this website. The source code for the project can be found here.

Look At It!

Some examples of different modes in different browsers…

Chromium light mode
Firefox dark mode

It mostly works!

What I Done Did?

Check the source code of course, but I essentially did as I said I would:

  1. Simplification - That massive list of keywords is awkward, I think I can do better.

I ripped out the keywords, they were over the top and didn’t add value. I also hate the idea of having to curate and maintain such a long list ever time C++ or friends pop out some random new feature.

  1. Comments - It would be nice to properly highlight comments in the source, such as /* this */ and // this.

Per language we can define some basic markup rules. In this case, we select anything inside (","), (', ') and (#,\n) pairs as comments or strings. We define the colour we want to use and some escape characters to watch out for if possible (although not bulletproof).

0001 # script_family()
0002 #
0003 # Make script-family code look nice.
0004 #
0005 # @param string The string to be parsed.
0006 # @param rep_dict The replace dictionary.
0007 # @param esc_dict The escape dictionary.
0008 # @return The modified string.
0009 def script_family(string, rep_dict, esc_dict) :
0010   s_rep = [
0011     ["\"", "\"", string_col,  '\\', '\\'],
0012     ["'",  "'",  string_col,  '\\', '\\'],
0013     ["#",  "\n", comment_col, '\0', '\0'],
0014   ]
0015   string = break_up(string, s_rep)
0016   result = ""
0017   for s in string :
0018     updated = False
0019     for r in s_rep :
0020       if s.startswith(r[0]) :
0021         result += fbeg + r[2] + '">' + multiple_replace(s, esc_dict) + fend
0022         updated = True
0023         break
0024     if not updated :
0025       result += multiple_replace(s, rep_dict)
0026   return result

I wasted a bunch of time trying to figure out how to do this. There’s not really any example code for this sort of thing… We’re parsing for multiple strings that may be included inside one another, searching with order of importance. I really quite like the code I came up with for this:

0027 # break_up()
0028 #
0029 # Break up a given string into parts defined by start and end strings.
0030 #
0031 # NOTE: The escaping performed here is no bulletproof and is relatively dumb.
0032 # It will fail at some edge cases, such as: "\\"
0033 #
0034 # @param string The string to be broken up.
0035 # @param pairs The start string, end string, <ignore>,
0036 def break_up(string, pairs, start = 0) :
0037   s_beg = -1
0038   s_end = -1
0039   for p in pairs :
0040     # Do we beet our current best end?
0041     s_tmp = string.find(p[0], start)
0042     if s_tmp >= start and (s_tmp < s_beg or s_beg < 0) :
0043       # Ensure we don't meet an escape character
0044       if s_tmp > 0 and string[s_tmp - 1] == p[3] :
0045         continue
0046       s_beg = s_tmp
0047       # Try to find an end point
0048       s_end = string.find(p[1], s_beg + 1)
0049       if s_end >= start :
0050         while string[s_end - 1] == p[4] :
0051           s_end = string.find(p[1], s_end + 1)
0052           if s_end < 0 :
0053             s_end = len(string)
0054             break
0055         else :
0056           s_end += len(p[1])
0057       else :
0058         s_end = len(string)
0059   if s_beg < start or s_end < start :
0060     return [string[start:len(string)]]
0061   else :
0062     result = [string[start:s_beg], string[s_beg:s_end]]
0063     result += break_up(string, pairs, s_end)
0064     return result

I wasted even more time trying to make it non-recursive, and whilst possible, it looked disgusting so I reverted back to this.

  1. Line numbering - I want line numbers down the sides of the snippets and give people the ability to directly link to them. Perhaps in the future it could be useful to reference code blocks on the site.
0065 # line_markup()
0066 #
0067 # Generate line numbers with HTML links that can be referenced.
0068 #
0069 # @param string The string to be processed.
0070 # @return The string with the added HTML line numbers.
0071 def line_markup(string) :
0072   global line_counter
0073   lines = string.split("\n")
0074   string = ""
0075   for x in range(len(lines)) :
0076     line_counter += 1
0077     if x > 0 :
0078       string += "\n"
0079     string += (
0080       '<a class="disable" id="L' + str(line_counter) + '" href="#L' + str(line_counter) + '">' +
0081       fbeg + line_col + '">' + str(line_counter).rjust(4, '0') + fend +
0082       '</a>&nbsp;' + lines[x]
0083     )
0084   return string
  1. Better colours - I still think there is room for improvement on the colours used.

Look at the perty colours:

0085 symbolother_col  = "#FF920D"
0086 symbolmath_col   = "#197BCE"
0087 symbolnumber_col = "#E32929"
0088 symbolpairs_col  = "#AF18DB"
0089 string_col       = "#008000"
0090 comment_col      = "#888888"
0091 line_col         = "#444444"

It’s not perfect, but it’s more than goof enough for me. With a little CSS, you too could end up with an ugly site!