Coffee Space


Listen:

Loader

loader.js is what is now used to convert pages on this site from a very strict raw markdown format to HTML. In the below sections we will de-construct the code and explain how it works.

There have been some design changes to remove markdown’s many to one issue, where several formatting markers have many ways to format text. One of these examples can be the headers, where putting a # before a word and a number of #s underneath a word both result in a h1 HTML header.

There are also a few structural changes to the format to make it easier to build parsers for, one being the way in which lists are created. Now, exactly two spaces are required exactly instead of just less than four. Again, this removes the idea that multiple different inputs lead to the same output - something I consider to be messy design.

Another change from the original specification is where HTML is broke and where it is not. Only code indicated by four spaces is escaped, other code follows the rules of HTML.

The last difference was another to make parsing much easier - each line is treated separately from all others. This makes parsing quicker and easier, but does lead to high numbers of elements. It is thought that modern browsers are in fact capable of handling this and should easily be able to abstract these away into light weight elements in RAM.

The latest version of the code can be found here.

Aims

Hopefully the code you see below meets all of those expectations!

Use

Here are some simple cases the code can be used:

Headers

0001 # Header 1
0002 ## Header 2
0003 ### Header 3
0004 #### Header 4
0005 ##### Header 5
0006 ###### Header 6

Note that for simplicity and keeping with the “per-line” model that has been implemented requiring only one pass, the hashes and minuses under text do not produce headers for simplicity of implementation. Why there are two ways of doing this is not fully understood, as documents become difficult to read.

Block Code

0007     Code after four spaces!

Code blocks work after four spaces. These will escape everything but xmp tags.

In-line Code

0008 Code can also go `here` too.

The code in-line is not escaped in any way, this allows for easy of processing as multiple types of formatting can be used inside.

Lists

These are basically anything with two spaces, followed by any string with no space, followed by a space. This allows for any type of system the user chooses. Some examples are below:

0009   * Un-ordered lists
0010   * Un-ordered lists
0011 
0012   1. Numbered lists
0013   2. Numbered lists
0014 
0015   a. Lettered lists
0016   b. Lettered lists
0017 
0018   i. Roman lists
0019   ii. Roman lists

You get the point!

Images

Images are simply in the following format:

0021 ![Image alternative text](Image URL)

Text Formatting

This has been chosen to be very simple and universal. The following options work well for formatting most text.

Bold
0022 **Two stars either side of the text do the trick**
Italics
0023 *One star either side of the text*

Code

0024 /**
0025  * Loader
0026  *
0027  * This loader is responsible for loading pages from source without blocking
0028  * the browser the code is run in. In this loader, pages are converted from
0029  * MarkDown to HTML for browser viewing. Please use the source below for
0030  * information on how the conversion is performed.
0031  *
0032  * NOTE: This is a simplified version of markdown and is designed to carry
0033  *       little if any complexity in order to increase usability and speed. If
0034  *       bugs are found, please contact B[].
0035  *
0036  * Features:
0037  *   * Headers
0038  *   * Lists
0039  *   * Code blocks
0040  *   * Links
0041  *   * Images
0042  *   * Bold
0043  *   * Italics
0044  *   * In-line code blocks
0045  *
0046  * @author B[]
0047  * @version 1.0.8
0048  **/

Just comments about the file.

0049 /* ---- Constants ---- */
0050 
0051 var TASK_BREAK_TIME_MS;
0052 var TASK_PROCESS_TIME_MS;
0053 var TAB_MAX;

Keep track of some basic information, such as the time to allow the browser to break for and the target time to process for, stored as TASK_BREAK_TIME_MS and TASK_PROCESS_TIME_MS respectively. TAB_MAX stores the spacing before code starts.

0054 /* ---- Global Variables ---- */
0055 
0056 var tasksScanned;
0057 var tasksCompleted;

tasksScanned store the number of tasks found whilst searching for tasks and tasksCompleted stores the number of tasks that have been run.

0058 /* ---- Stack ---- */
0059 
0060 var stackTimeout;
0061 var taskStack;
0062 var varStack;

stackTimeout is pre-calculated for the future timeout of the stack, instead of re-calculating every time. The taskStack is an array of tasks to be completed, accompanied by the varStack, containing useful variables for the taskStack. It’s important these remain in sync, otherwise tasks will get the wrong variables back!

0063 /* ---- Variables ---- */
0064 
0065 var elements;

Keep a copy of the elements we are replacing, this can help reduce some errors in poorly implemented JS browsers.

0066 /**
0067  * Loader()
0068  *
0069  * The task runner for the entire program. This function is responsible for
0070  * scanning the entire page and converting MarkDown to valid HTML.
0071  **/
0072 function Loader(){
0073   /* Initialise tasking system */
0074   init();

Initialise the variables - we have no idea whether we have run this previously on the same page.

0075   /* Task payloads */
0076   elements = document.getElementsByName("md");
0077   for(var i = 0; i < elements.length; i++){
0078     /* Add task to remove element */
0079     addTask(function(){
0080       var elem = getVar();
0081       elem.innerHTML = "";
0082     });

Add a task to clear all of the elements, ready for our new data.

0083     /* Add reference to element to be cleaned */
0084     addVar(elements[i]);

The clearing tasks will need some elements to clean.

0085     /* Process the element lines */
0086     var lines = elements[i].innerHTML.split("\n");
0087     for(var e = 0; e < lines.length; e++){

For each line, perform our task processing (in the future).

0088       /* Add task to task stack */
0089       addTask(function(){

Add the below function code to be processed for each function. It’s relatively heavy, but you can garuntee the code will work each time.

0090         var elem = getVar();
0091         var line = getVar();
0092         var skip = false;
0093         /* <<<< Entire line tests >>>> */
0094         if(line.length == 0){
0095           line = "<br /><br />";
0096         }
0097         /* <<<< Start of line tests >>>> */
0098         if(line[0] == '#'){
0099           var temp = line;
0100           /* Find out type of header */
0101           var len = line.length;
0102           var h = 1;
0103           for(var z = 1; z < len; z++){
0104             if(line[z] == '#'){
0105               h++;
0106             }else{
0107               /* Make sure next character is space */
0108               if(line[z] == ' '){
0109                 /* Remove previous markers */
0110                 temp = line.slice(h + 1);
0111               }
0112               z = line.length;
0113             }
0114           }
0115           /* Add HTML */
0116           temp = "<h" + h + ">" + temp + "</h" + h + ">";
0117           /* Replace line for searching */
0118           line = temp;
0119         }
0120         if(line[0] == ' '){
0121           if(line[1] == ' '){
0122             /* Check whether we have a list or potential code block */
0123             if(line[2] == ' '){
0124               /* Check whether we have code block */
0125               if(line[3] == ' '){
0126                 /* Escape the string */
0127                 temp = line.slice(4);
0128                 temp = temp.split('&').join("&amp;");
0129                 temp = temp.split('<').join("&lt;");
0130                 temp = temp.split('>').join("&gt;");
0131                 temp = temp.split('"').join("&quot;");
0132                 /* Check the length, add some space is zero */
0133                 if(temp.length <= 0){
0134                   temp += ' ';
0135                 }
0136                 /* Throw some pre-tags around it */
0137                 line = "<pre name=\"code\" style=\"margin:0px;\">" + temp + "</pre>";
0138                 skip = true;
0139               }
0140             }else{
0141               /* Indent the list */
0142               var point = line.slice(2).split(" ");
0143               var pointLen = point[0].length;
0144               if(point[0] == "*"){
0145                 point[0] = "&middot;&nbsp;";
0146               }
0147               var temp = "<tt name=\"list\">&nbsp;&nbsp;" + point[0];
0148               for(var z = point[0].length; z < TAB_MAX; z++){
0149                 temp += "&nbsp;";
0150               }
0151               temp += "</tt>" + line.slice(2 + pointLen);
0152               line = temp + "<br />";
0153             }
0154           }
0155         }
0156         /* <<<< Middle of line tests >>>> */
0157         /* Only perform tests if we shouldn't be skipping */
0158         if(!skip){
0159           var temp = "";
0160           var images = line.split("![");
0161           if(!(images.length == 1 && !(images[0] == '!' && images[1] == '['))){
0162             for(var z = 0; z < images.length; z++){
0163               var endS = images[z].indexOf(']');
0164               var begC = images[z].indexOf('(', endS);
0165               var endC = images[z].indexOf(')', begC);
0166               /* If invalid, skip over */
0167               if(endS < 0 || begC < 0 || endC < 0 || endS + 1 != begC){
0168                 /* Put everything back as it was */
0169                 if(z > 0){
0170                   temp += "![";
0171                 }
0172                 temp += images[z];
0173               }else{
0174                 temp += "<img alt=\"";
0175                 temp += images[z].slice(0, endS);
0176                 temp += "\" src=\"";
0177                 temp += images[z].slice(begC + 1, endC);
0178                 temp += "\">";
0179                 /* Add everything that wasn't part of the breakup */
0180                 temp += images[z].slice(endC + 1);
0181               }
0182             }
0183             line = temp;
0184           }
0185           temp = "";
0186           var links = line.split("[");
0187           if(!(links.length == 1 && line[0] != '[')){
0188             for(var z = 0; z < links.length; z++){
0189               var endS = links[z].indexOf(']');
0190               var begC = links[z].indexOf('(', endS);
0191               var endC = links[z].indexOf(')', begC);
0192               /* If invalid, skip over */
0193               if(endS < 0 || begC < 0 || endC < 0 || endS + 1 != begC){
0194                 /* Put everything back as it was */
0195                 if(z > 0){
0196                   temp += "[";
0197                 }
0198                 temp += links[z];
0199               }else{
0200                 temp += "<a href=\"";
0201                 temp += links[z].slice(begC + 1, endC);
0202                 temp += "\">";
0203                 temp += links[z].slice(0, endS);
0204                 temp += "</a>";
0205                 /* Add everything that wasn't part of the breakup */
0206                 temp += links[z].slice(endC + 1);
0207               }
0208             }
0209             line = temp;
0210           }
0211           var pos = 0;
0212           while(pos >= 0){
0213             /* Search for first instance */
0214             pos = line.indexOf("**");
0215             if(pos >= 0){
0216               /* Replace first instance */
0217               line = line.slice(0, pos) + "<b>" + line.slice(pos + 2);
0218               /* Search for second instance */
0219               pos = line.indexOf("**");
0220               if(pos >= 0){
0221                 /* Replace second instance */
0222                 line = line.slice(0, pos) + "</b>" + line.slice(pos + 2);
0223               }
0224             }
0225           }
0226           pos = 0;
0227           while(pos >= 0){
0228             /* Search for first instance that doesn't start with spaces */
0229             pos = line.indexOf("*");
0230             if(pos >= 0){
0231               /* Replace first instance */
0232               line = line.slice(0, pos) + "<i>" + line.slice(pos + 1);
0233               /* Search for second instance */
0234               pos = line.indexOf("*");
0235               if(pos >= 0){
0236                 /* Replace second instance */
0237                 line = line.slice(0, pos) + "</i>" + line.slice(pos + 1);
0238               }
0239             }
0240           }
0241           pos = 0;
0242           while(pos >= 0){
0243             /* Search for first instance that doesn't start with spaces */
0244             pos = line.indexOf("`");
0245             if(pos >= 0){
0246               /* Replace first instance */
0247               line = line.slice(0, pos) + "<pre class=\"inline\">" + line.slice(pos + 1);
0248               /* Search for second instance */
0249               pos = line.indexOf("`");
0250               if(pos >= 0){
0251                 /* Replace second instance */
0252                 line = line.slice(0, pos) + "</pre>" + line.slice(pos + 1);
0253               }
0254             }
0255           }
0256         }
0257         /* Add line to element */
0258         elem.innerHTML += line;
0259       });
0260       /* Add reference to elements */
0261       addVar(elements[i]);
0262       /* Allow function to access line */
0263       addVar(lines[e]);

Add the requires variables to reference each line.

0264     }
0265     /* Add task to swap elements XMP for P */
0266     addTask(function(){
0267       var elem = getVar();
0268       var nElem = document.createElement('p');
0269       nElem.innerHTML = elem.innerHTML;
0270       elem.parentNode.insertBefore(nElem, elem);
0271       elem.parentNode.removeChild(elem);
0272     });

Replace the xmp tags with p tags. We want some pretty formatting after all.

0273     /* Add reference to element to be cleaned */
0274     addVar(elements[i]);

Replacing the xmp tags will require a reference to them.

0275   }
0276   /* Process tasks */
0277   process();

Finally, let’s start actually completing some tasks!

0278 }
0279 
0280 /**
0281  * init()
0282  *
0283  * The initialiser for the tasking system.
0284  **/
0285 function init(){
0286   /* Allow the browser to process other tasks */
0287   TASK_BREAK_TIME_MS = 128;

A nice power of two will do nicely to allow the browser time to recover.

0288   /* Time to process the tasks for */
0289   TASK_PROCESS_TIME_MS = 256;

Again, a power of two for processing time, double that of the TASK_BREAK_TIME_MS, otherwise it may not even be worth coming back alive given the overhead of preparing the next task.

0290   /* Record progress completion */
0291   tasksScanned = 0;
0292   tasksCompleted = 0;

Reset the task status variables, who knows - this may not be our first rodeo on this page!

0293   /* The overall task stack */
0294   taskStack = [];
0295   /* The variable stack */
0296   varStack = [];
0297   /* Set tab space size */

Empty out the stacks, we don’t want the possibility of processing any previous tasks.

0298   TAB_MAX = 4;

Set the tab size for code to four.

0299 }
0300 
0301 /**
0302  * addTask()
0303  *
0304  * Adds task to back of task list and increment task count.
0305  **/
0306 function addTask(func){
0307   taskStack.push(func);
0308   tasksScanned++;
0309 }

This function simply adds tasks in a well defined way. In the future it may be hashed in some way to make searching easier.

0310 /**
0311  * addVar()
0312  *
0313  * Adds a variable to the variable stack.
0314  **/
0315 function addVar(v){
0316   varStack.push(v);
0317 }

This simply allows variables to be added for the current task. Multiple variables may be added for the task, but they all must be read using getVar() in order to correctly allow the next task to read its variables.

0318 /**
0319  * getVar()
0320  *
0321  * Gets a variable and removes it from the list.
0322  **/
0323 function getVar(){
0324   var r = varStack[0];
0325   varStack.shift();
0326   return r;
0327 }

The getVar() function makes sure that the used variables are removed from the array.

0328 /**
0329  * process()
0330  *
0331  * Process tasks.
0332  **/
0333 function process(){
0334   /* Make temporary date variable */
0335   var now = new Date();

We will use the now variable to get the current time for this process loop.

0336   /* Set the stack time out for the future */
0337   stackTimeout = now.getTime() + TASK_PROCESS_TIME_MS;

Pre-calculate when we need to stop processing the stack and store that information in the global variable stackTimeout.

0338   /* Iterate over tasks that remain */
0339   var i = 0;
0340   var run = true;
0341   for(; i < taskStack.length && run == true; i++){

Start processing through the stack.

0342     now = new Date();

Regenerate the current time.

0343     /* Check whether we have run our time */
0344     if(now.getTime() >= stackTimeout){

Make sure that we haven’t over-run our allowed time to process the tasks.

0345       /* Break out of the loop */
0346       run = false;
0347       /* Decrement indexing */
0348       i--;
0349     }else{
0350       /* Run next stack item */
0351       taskStack[i]();
0352       /* Increment number of tasks complete */
0353       tasksCompleted++;

Process the next task if we have time and record the fact we processed it in tasksCompleted.

0354     }
0355   }
0356   console.log("Time Took: " + (now.getTime() + TASK_PROCESS_TIME_MS - stackTimeout) + "ms")

Print a message about the progress for debug purposes. If other people have issues with browser freezing, this information is likely to be useful to them.

0357   /* When we get here, removed processed items from the stack */
0358   var tempStack = taskStack.slice(i);
0359   taskStack = tempStack;
0360   console.log("Stack Remaining = " + taskStack.length);

Remove stack items that have been used up and record how much is still left to be processed.

0361   /* Register break out time callback if more processing required */
0362   if(taskStack.length > 0){
0363     setTimeout(function(){ process(); }, TASK_BREAK_TIME_MS);
0364   }

Register our interest in being called back if more processing is required.

0365 }

Conclusion

Whilst I could have these pages pre-generated and save the end-user the hassle and added complexity of building their own pages, I think there is some value in giving the end user a raw information format that is in theory timeless. Another positive to this method is the size of the pages dramatically comes down and the ability to compress the pages increases. With this format, the added size of HTML tags is avoided and in theory brings the page size down.

From a design perspective, this method enforces pages to follow similar if not identical formatting rules meaning that the site remains coherent despite small changes in stylisation and changes in direction over time.