Coffee Space – Coffee Space

Elk Multitask

Background

For a while now I have been thinking about the problem of running multiple basic applications on very low-end hardware, in a safe and efficient manner. This idea probably first came to me when I was looking at the PineTime a while back. Lup Yuen and I were discussing the possible of running multiple apps on the smart watch at the same time using something like WASM, although the porting process is non-trivial ¹.

Possibly from this line of enquiry, I have been looking to build a Linux PDA for a while now. I want a simple open source personal device with limited capability. I did begin to explore the possibility of using the Pine Cube, unfortunately the Pine Cube did not work out as a platform due to the lack of touchscreen support - but also the form factor was entirely wrong anyway. I spoke about multiple options at the end of that article, but all of them are quite hardware intensive.

I recently got pointed at the m5paper device, a $100 e-ink portable device with battery, touch display and WiFi. One caveat though: It has an ESP32. This would not be a Linux device on such low-end hardware.

Thinking back to this idea of running some form of byte code, I again thought that it could be possible to run some form of high-level language that interacts with systems both on the Linux desktop and on this low-end device.

The obvious choice for me was Java - it was literally designed with portability in mind.

First I checked out NanoVM, it’s very much a contender requiring less than 8kB of flash memory, at least 512 bytes of code storage for Java byte code and at least 768 bytes of RAM to run. It’s a little on the high side, but manageable. There are of course some downsides as to why I did not use it… There is no native port to the ESP32, meaning I would need to entirely port it myself. It’s not at all clear the generated code is actually so portable anyway and it relies heavily on custom C functions and a decent chunk of the Java core library needing to be rewritten for the platform.

Next I considered uJ, a JVM for microcontrollers. This implementation appears to take up about 80kB of code space and also requires several kilobytes of RAM. There is currently no ESP32 implementation (people asked for help in the comments but got no reply) and it looks like it would be a pain to actually implement.

I also considered microej, but this appears to not be open source and therefore against my goal, despite appearing to have a pretty decent JVM implementation for the ESP32 (or at least some form of development board based on it).

Enter Hacker News. I saw a post named: ‘Elk: A low footprint JavaScript engine for embedded systems’ - interesting! I wasn’t expecting so much, but I checked out the GitHub repository anyway. Apparently the compiled code takes about 20kB of flash, and as few as 100 bytes for the VM! Okay, this is getting very interesting.

Elk

Elk is surprisingly cool and simple. The abstract hooked me:

Elk is a tiny embeddable JavaScript engine that implements a small but usable subset of ES6. It is designed for microcontroller development. Instead of writing firmware code in C/C++, Elk allows to develop in JavaScript. Another use case is providing customers with a secure, protected scripting environment for product customisation.

The feature set was also super interesting:

Elk features include:

Cross platform. Works anywhere from 8-bit microcontrollers to 64-bit servers

Zero dependencies. Builds cleanly by ISO C or ISO C++ compilers

Easy to embed: just copy elk.c and elk.h to your source tree

Very small and simple embedding API

Can call native C/C++ functions from JavaScript and vice versa

Does not use malloc. Operates with a given memory buffer only

Small footprint: about 20KB on flash/disk, about 100 bytes RAM for core VM

No bytecode. Interprets JS code directly

It’s incredibly simple and seems to be able to run on basically anything - it looks as if it should run happily on either desktop or microcontroller happily, blissfully unaware of which it is operating on!

So what benefits would there to be using such a system? Firstly, anyone writing applications for the device would not need a compiler. All they need to do is run the code and they’re on their way. This really lowers the bar-to-entry for third-party applications.

What disadvantages are there? Of course, the cost of not being either a binary or bytecode is that there is some super speed disadvantage. As the readme page suggests:

0001 let a = 0;        // 97 milliseconds on a 16Mhz 8-bit Atmega328P (Arduino Uno and alike)
0002 while (a < 100)   // 16 milliseconds on a 48Mhz SAMD21
0003 a++;            //  5 milliseconds on a 133Mhz Raspberry RP2040
0004 //  2 milliseconds on a 240Mhz ESP32

Okay, it’s pretty slow. We can likely do some patches to speed it up, but we won’t be talking about order of magnitude here. Ideally we want to outsource as much of the heavy lifting as possible with C-based helper functions and avoid spending too long doing any real logic in the JS.

Requirements

So it passes the basic sniff test, what now? Well ideally we need it to be able to run multiple JS programs at once. We need some form of multi-tasking. Of course, we don’t have multiple cores available on most of the microcontrollers we will want to run on, so something smarter will be required!

One method to implement this could be to cooperative multitasking, where each JS program either yields (voluntarily gives up computing to allow another application to run). This can be achieved by either each program manually calling a yield() function, or a yield being called on behalf of the program when they access some external function. This of course relies on every application behaving well (no bugs) and every application being considerate of others. Given we want to lower the boundary to entry, it would make more sense to choose another method.

Another method to consider preemptive multitasking, where each application it “interrupted” manually and time is given to another. This means that even if one application is misbehaving or consuming excessive resources, it is possible for another application to get computation time.

A better architecture still would be to have some prioritization process. For an input-focussed embedded device, the most important application will likely be the one currently on the display with other “background” applications likely only needing occasional computation. It would be relatively easy to implement such a prioritization system.

Currently, Elk doesn’t support multitasking of any kind. There is an evaluation function that takes the entire JS code, processes it and then only returns once execution ends. This is obviously bad for sharing processing time, but also bad for preventing a badly behaving JS code from running indefinitely.

Multitasking

Okay, so how do we make some form of basic multitasking?

Library Changes

The way this normally works is that you run js_eval(), and this simply checks the length of the code and then runs js_eval_nogc(). This function is the meat grinder:

0005 static jsval_t js_eval_nogc(struct js *js, const char *buf, jsoff_t len) {
0006   jsval_t res = mkval(T_UNDEF, 0);
0007   js->tok = TOK_ERR;
0008   js->code = buf;
0009   js->clen = len;
0010   js->pos = 0;
0011   while (js->tok != TOK_EOF && !is_err(res)) {
0012     js->pos = skiptonext(js->code, js->clen, js->pos);
0013     if (js->pos >= js->clen) break;
0014     res = js_stmt(js, TOK_SEMICOLON);
0015   }
0016   return res;
0017 }

It initialises the engine and then runs the code until EOF (end of file), an error occurs or the code simply completes execution. It then returns the result (if there is one) and all is good. Seems simple enough.

The problem is, with this implementation we would need to implement some form of threading in order to have multiple JS programs executed at the same time. A thread is not so heavy for most things, but when you’re talking about a few kilobytes of RAM, a thread is a significantly heavy object.

We can of course split up the function js_eval_nogc() and get it to operate how we want! In elk.h I added two additional functions:

js_init() - Initialises the JS engine.
js_run() - Runs the JS code one section at a time.

In the header this looks like this:

0018 ...
0019 jsval_t js_eval(struct js *, const char *, size_t);  // Execute JS code
0020 void js_init(struct js *, const char *);             // Initialise JS code
0021 jsval_t js_run(struct js *);                         // Run JS code
0022 jsval_t js_glob(struct js *);                        // Return global object
0023 ...

For the implementation of js_init() we are simply “resetting” the JS engine like so:

0024 void js_init(struct js *js, const char *buf) {
0025   js->tok = TOK_ERR;
0026   js->code = buf;
0027   js->clen = strlen(buf);
0028   js->pos = 0;
0029 }

And then for execution function js_run() we are simply removing the loop:

0030 jsval_t js_run(struct js *js) {
0031   jsval_t res = mkval(T_UNDEF, 0);
0032   if (js->tok != TOK_EOF && !is_err(res)) {
0033     js->pos = skiptonext(js->code, js->clen, js->pos);
0034     if (js->pos >= js->clen) return res;
0035     res = js_stmt(js, TOK_SEMICOLON);
0036   }
0037   return res;
0038 }

This is not an efficient implementation as it calls mkval() on each call (which is simply a waste). We could also likely get rid of a lot of the checking.

Testing

And now for the test program, built on top of the test examples on their GitHub page:

0039 #include <stdio.h>
0040 #include "../elk/elk.c"
0041 
0042 char mem1[256];
0043 char mem2[256];
0044 char* src1 =
0045   "let a = 1;"
0046   "let b = 2;"
0047   "a = sum(a, b);"
0048   "b = sum(b, a);"
0049   "sum(a, b);";
0050 char* src2 =
0051   "let a = 1;"
0052   "while(a < 128){"
0053     "a = sum(a, a);"
0054   "}"
0055   "a;";
0056 
0057 /**
0058  * sum()
0059  *
0060  * Sum two integers together and return the result.
0061  *
0062  * @param a The first integer.
0063  * @param b The second integer. 
0064  * @return The summed integers. 
0065  **/
0066 int sum(int a, int b){ 
0067   return a + b;
0068 }
0069 
0070 /**
0071  * main()
0072  *
0073  * A test function for checking elk concurrency.
0074  *
0075  * @param argv The command line arguments.
0076  * @param argc The number of command line arguments.
0077  * @return The exit status of the program.
0078  **/
0079 int main(char** argv, int argc){
0080   /* Setup JS environments in the given memory */
0081   struct js* js1 = js_create(mem1, sizeof(mem1));
0082   struct js* js2 = js_create(mem2, sizeof(mem2));
0083   /* Give external C functions */
0084   js_set(js1, js_glob(js1), "sum", js_import(js1, (uintptr_t)sum, "iii"));
0085   js_set(js2, js_glob(js2), "sum", js_import(js2, (uintptr_t)sum, "iii"));
0086   /* Variable to return result */
0087   jsval_t r1;
0088   jsval_t r2;
0089   /* Initialise the code */
0090   js_init(js1, src1);
0091   js_init(js2, src2);
0092   /* Run main loop */
0093   bool running = true;
0094   while(running){
0095     running = false;
0096     if(js1->tok != TOK_EOF && js1->pos < js1->clen){
0097       printf("js1: %i\n", js1->pos);
0098       r1 = js_run(js1);
0099       running = true;
0100     }
0101     if(js2->tok != TOK_EOF && js2->pos < js2->clen){
0102       printf("js2: %i\n", js2->pos);
0103       r2 = js_run(js2);
0104       running = true;
0105     }
0106   }
0107   printf("result1: %s\n", js_str(js1, r1));
0108   printf("result2: %s\n", js_str(js2, r2));
0109   return 0;
0110 }

The implementation is pretty simple, it just runs both JS engines until both of them are complete. The sum() function ran in the JS is a native C function, which proves that we can leverage C functions for greater speed, despite just operating in JS.

The programs being run are pretty simple on purpose. For the first engine, we run the following JS:

0111 let a = 1;
0112 let b = 2;
0113 a = sum(a, b);
0114 b = sum(b, a);
0115 sum(a, b);

This program simply does the following:

$a = 1$
$b = 2$
$a = a + b = 1 + 2 = 3$
$b = b + a = 2 + 3 = 5$
Return: $a + b = 5 + 3 = 8$

For the second program, we wanted to do something different (to prove the executions do not interfere with one another) and run for slightly longer:

0116 let a = 1;
0117 while(a < 128){
0118  a = sum(a, a);
0119 }
0120 a;

This program simply does the following:

$a = 1$
$a = a + a = 1 + 1$
Repeat the previous step until $a \ge 128$
Return: $a \equiv 128$

Results

So now for the big question: Does it work?

0121 $ ./elktest
0122 js1: 0
0123 js2: 0
0124 js1: 10
0125 js2: 10
0126 js1: 20
0127 js2: 10
0128 js1: 34
0129 js2: 10
0130 js1: 48
0131 js2: 10
0132 js2: 10
0133 js2: 10
0134 js2: 10
0135 js2: 10
0136 js2: 40
0137 result1: 8
0138 result2: 128

Yes. We can clearly see that it swaps back and fourth between the two different JS ‘programs’, finishing the execution of the first program and finally displaying the results once the second program has also finished running.

I am very sure it is not perfect - I’m pretty sure that it will not evenly distribute time across the two different programs, but for an initial test, it does in fact seem to be plausible to ‘multitask’ on a single core in limited memory - cool!

Going Forwards

Going forwards, I believe this could very much be a viable way forwards. It seems as though each application could in theory run in limited memory and not run the risk of crashing neighbouring applications. Also, given the ability to bind C functions, each application could feasibly leverage the speed of C functions and run just surface level logic in JS.

I will very much be considering using this in a PDA implementation. In theory this could allow for easy syncing between a desktop and microcontroller implementation, running near identical code. It would also allow for the rapid implementation of new apps and programs on the PDA device.

The end of this month going well, I think I will look towards treating myself to such a device as the m5paper (subject to availability). Let’s call it a celebration and early Christmas present if that helps.

Unfortunately nothing ever came from our discussions, but I still believe it could be something worth exploring.↩︎