Just yesterday I wrote about a project to improve the code highlighting of this website. The source code for the project can be found here.
Some examples of different modes in different browsers…
It mostly works!
Check the source code of course, but I essentially did as I said I would:
- Simplification - That massive list of keywords is awkward, I think I can do better.
I ripped out the keywords, they were over the top and didn’t add value. I also hate the idea of having to curate and maintain such a long list ever time C++ or friends pop out some random new feature.
- Comments - It would be nice to properly highlight comments in the source, such as
/* this */
and// this
.
Per language we can define some basic markup rules. In this case, we select anything inside ("
,"
), ('
, '
) and (#
,\n
) pairs as comments or strings. We define the colour we want to use and some escape characters to watch out for if possible (although not bulletproof).
0001 # script_family() 0002 # 0003 # Make script-family code look nice. 0004 # 0005 # @param string The string to be parsed. 0006 # @param rep_dict The replace dictionary. 0007 # @param esc_dict The escape dictionary. 0008 # @return The modified string. 0009 def script_family(string, rep_dict, esc_dict) : 0010 s_rep = [ 0011 ["\"", "\"", string_col, '\\', '\\'], 0012 ["'", "'", string_col, '\\', '\\'], 0013 ["#", "\n", comment_col, '\0', '\0'], 0014 ] 0015 string = break_up(string, s_rep) 0016 result = "" 0017 for s in string : 0018 updated = False 0019 for r in s_rep : 0020 if s.startswith(r[0]) : 0021 result += fbeg + r[2] + '">' + multiple_replace(s, esc_dict) + fend 0022 updated = True 0023 break 0024 if not updated : 0025 result += multiple_replace(s, rep_dict) 0026 return result
I wasted a bunch of time trying to figure out how to do this. There’s not really any example code for this sort of thing… We’re parsing for multiple strings that may be included inside one another, searching with order of importance. I really quite like the code I came up with for this:
0027 # break_up() 0028 # 0029 # Break up a given string into parts defined by start and end strings. 0030 # 0031 # NOTE: The escaping performed here is no bulletproof and is relatively dumb. 0032 # It will fail at some edge cases, such as: "\\" 0033 # 0034 # @param string The string to be broken up. 0035 # @param pairs The start string, end string, <ignore>, 0036 def break_up(string, pairs, start = 0) : 0037 s_beg = -1 0038 s_end = -1 0039 for p in pairs : 0040 # Do we beet our current best end? 0041 s_tmp = string.find(p[0], start) 0042 if s_tmp >= start and (s_tmp < s_beg or s_beg < 0) : 0043 # Ensure we don't meet an escape character 0044 if s_tmp > 0 and string[s_tmp - 1] == p[3] : 0045 continue 0046 s_beg = s_tmp 0047 # Try to find an end point 0048 s_end = string.find(p[1], s_beg + 1) 0049 if s_end >= start : 0050 while string[s_end - 1] == p[4] : 0051 s_end = string.find(p[1], s_end + 1) 0052 if s_end < 0 : 0053 s_end = len(string) 0054 break 0055 else : 0056 s_end += len(p[1]) 0057 else : 0058 s_end = len(string) 0059 if s_beg < start or s_end < start : 0060 return [string[start:len(string)]] 0061 else : 0062 result = [string[start:s_beg], string[s_beg:s_end]] 0063 result += break_up(string, pairs, s_end) 0064 return result
I wasted even more time trying to make it non-recursive, and whilst possible, it looked disgusting so I reverted back to this.
- Line numbering - I want line numbers down the sides of the snippets and give people the ability to directly link to them. Perhaps in the future it could be useful to reference code blocks on the site.
0065 # line_markup() 0066 # 0067 # Generate line numbers with HTML links that can be referenced. 0068 # 0069 # @param string The string to be processed. 0070 # @return The string with the added HTML line numbers. 0071 def line_markup(string) : 0072 global line_counter 0073 lines = string.split("\n") 0074 string = "" 0075 for x in range(len(lines)) : 0076 line_counter += 1 0077 if x > 0 : 0078 string += "\n" 0079 string += ( 0080 '<a class="disable" id="L' + str(line_counter) + '" href="#L' + str(line_counter) + '">' + 0081 fbeg + line_col + '">' + str(line_counter).rjust(4, '0') + fend + 0082 '</a> ' + lines[x] 0083 ) 0084 return string
- Better colours - I still think there is room for improvement on the colours used.
Look at the perty colours:
0085 symbolother_col = "#FF920D" 0086 symbolmath_col = "#197BCE" 0087 symbolnumber_col = "#E32929" 0088 symbolpairs_col = "#AF18DB" 0089 string_col = "#008000" 0090 comment_col = "#888888" 0091 line_col = "#444444"
It’s not perfect, but it’s more than goof enough for me. With a little CSS, you too could end up with an ugly site!