Coffee Space


Listen:

Elk Multitask

Preview Image

Preview Image

Background

For a while now I have been thinking about the problem of running multiple basic applications on very low-end hardware, in a safe and efficient manner. This idea probably first came to me when I was looking at the PineTime a while back. Lup Yuen and I were discussing the possible of running multiple apps on the smart watch at the same time using something like WASM, although the porting process is non-trivial 1.

Possibly from this line of enquiry, I have been looking to build a Linux PDA for a while now. I want a simple open source personal device with limited capability. I did begin to explore the possibility of using the Pine Cube, unfortunately the Pine Cube did not work out as a platform due to the lack of touchscreen support - but also the form factor was entirely wrong anyway. I spoke about multiple options at the end of that article, but all of them are quite hardware intensive.

I recently got pointed at the m5paper device, a $100 e-ink portable device with battery, touch display and WiFi. One caveat though: It has an ESP32. This would not be a Linux device on such low-end hardware.

Thinking back to this idea of running some form of byte code, I again thought that it could be possible to run some form of high-level language that interacts with systems both on the Linux desktop and on this low-end device.

The obvious choice for me was Java - it was literally designed with portability in mind.

First I checked out NanoVM, it's very much a contender requiring less than 8kB of flash memory, at least 512 bytes of code storage for Java byte code and at least 768 bytes of RAM to run. It's a little on the high side, but manageable. There are of course some downsides as to why I did not use it... There is no native port to the ESP32, meaning I would need to entirely port it myself. It's not at all clear the generated code is actually so portable anyway and it relies heavily on custom C functions and a decent chunk of the Java core library needing to be rewritten for the platform.

Next I considered uJ, a JVM for microcontrollers. This implementation appears to take up about 80kB of code space and also requires several kilobytes of RAM. There is currently no ESP32 implementation (people asked for help in the comments but got no reply) and it looks like it would be a pain to actually implement.

I also considered microej, but this appears to not be open source and therefore against my goal, despite appearing to have a pretty decent JVM implementation for the ESP32 (or at least some form of development board based on it).

Enter Hacker News. I saw a post named: 'Elk: A low footprint JavaScript engine for embedded systems' - interesting! I wasn't expecting so much, but I checked out the GitHub repository anyway. Apparently the compiled code takes about 20kB of flash, and as few as 100 bytes for the VM! Okay, this is getting very interesting.

Elk

Elk is surprisingly cool and simple. The abstract hooked me:

Elk is a tiny embeddable JavaScript engine that implements a small but usable subset of ES6. It is designed for microcontroller development. Instead of writing firmware code in C/C++, Elk allows to develop in JavaScript. Another use case is providing customers with a secure, protected scripting environment for product customisation.

The feature set was also super interesting:

Elk features include:

  • Cross platform. Works anywhere from 8-bit microcontrollers to 64-bit servers
  • Zero dependencies. Builds cleanly by ISO C or ISO C++ compilers
  • Easy to embed: just copy elk.c and elk.h to your source tree
  • Very small and simple embedding API
  • Can call native C/C++ functions from JavaScript and vice versa
  • Does not use malloc. Operates with a given memory buffer only
  • Small footprint: about 20KB on flash/disk, about 100 bytes RAM for core VM
  • No bytecode. Interprets JS code directly

It's incredibly simple and seems to be able to run on basically anything - it looks as if it should run happily on either desktop or microcontroller happily, blissfully unaware of which it is operating on!

So what benefits would there to be using such a system? Firstly, anyone writing applications for the device would not need a compiler. All they need to do is run the code and they're on their way. This really lowers the bar-to-entry for third-party applications.

What disadvantages are there? Of course, the cost of not being either a binary or bytecode is that there is some super speed disadvantage. As the readme page suggests:

let a = 0;        // 97 milliseconds on a 16Mhz 8-bit Atmega328P (Arduino Uno and alike)
while (a < 100)   // 16 milliseconds on a 48Mhz SAMD21
  a++;            //  5 milliseconds on a 133Mhz Raspberry RP2040
                  //  2 milliseconds on a 240Mhz ESP32

Okay, it's pretty slow. We can likely do some patches to speed it up, but we won't be talking about order of magnitude here. Ideally we want to outsource as much of the heavy lifting as possible with C-based helper functions and avoid spending too long doing any real logic in the JS.

Requirements

So it passes the basic sniff test, what now? Well ideally we need it to be able to run multiple JS programs at once. We need some form of multi-tasking. Of course, we don't have multiple cores available on most of the microcontrollers we will want to run on, so something smarter will be required!

One method to implement this could be to cooperative multitasking, where each JS program either yields (voluntarily gives up computing to allow another application to run). This can be achieved by either each program manually calling a yield() function, or a yield being called on behalf of the program when they access some external function. This of course relies on every application behaving well (no bugs) and every application being considerate of others. Given we want to lower the boundary to entry, it would make more sense to choose another method.

Another method to consider preemptive multitasking, where each application it "interrupted" manually and time is given to another. This means that even if one application is misbehaving or consuming excessive resources, it is possible for another application to get computation time.

A better architecture still would be to have some prioritization process. For an input-focussed embedded device, the most important application will likely be the one currently on the display with other "background" applications likely only needing occasional computation. It would be relatively easy to implement such a prioritization system.

Currently, Elk doesn't support multitasking of any kind. There is an evaluation function that takes the entire JS code, processes it and then only returns once execution ends. This is obviously bad for sharing processing time, but also bad for preventing a badly behaving JS code from running indefinitely.

Multitasking

Okay, so how do we make some form of basic multitasking?

Library Changes

The way this normally works is that you run js_eval(), and this simply checks the length of the code and then runs js_eval_nogc(). This function is the meat grinder:

static jsval_t js_eval_nogc(struct js *js, const char *buf, jsoff_t len) {
  jsval_t res = mkval(T_UNDEF, 0);
  js->tok = TOK_ERR;
  js->code = buf;
  js->clen = len;
  js->pos = 0;
  while (js->tok != TOK_EOF && !is_err(res)) {
    js->pos = skiptonext(js->code, js->clen, js->pos);
    if (js->pos >= js->clen) break;
    res = js_stmt(js, TOK_SEMICOLON);
  }
  return res;
}

It initialises the engine and then runs the code until EOF (end of file), an error occurs or the code simply completes execution. It then returns the result (if there is one) and all is good. Seems simple enough.

The problem is, with this implementation we would need to implement some form of threading in order to have multiple JS programs executed at the same time. A thread is not so heavy for most things, but when you're talking about a few kilobytes of RAM, a thread is a significantly heavy object.

We can of course split up the function js_eval_nogc() and get it to operate how we want! In elk.h I added two additional functions:

  • js_init() - Initialises the JS engine.
  • js_run() - Runs the JS code one section at a time.

In the header this looks like this:

...
jsval_t js_eval(struct js *, const char *, size_t);  // Execute JS code
void js_init(struct js *, const char *);             // Initialise JS code
jsval_t js_run(struct js *);                         // Run JS code
jsval_t js_glob(struct js *);                        // Return global object
...

For the implementation of js_init() we are simply "resetting" the JS engine like so:

void js_init(struct js *js, const char *buf) {
  js->tok = TOK_ERR;
  js->code = buf;
  js->clen = strlen(buf);
  js->pos = 0;
}

And then for execution function js_run() we are simply removing the loop:

jsval_t js_run(struct js *js) {
  jsval_t res = mkval(T_UNDEF, 0);
  if (js->tok != TOK_EOF && !is_err(res)) {
    js->pos = skiptonext(js->code, js->clen, js->pos);
    if (js->pos >= js->clen) return res;
    res = js_stmt(js, TOK_SEMICOLON);
  }
  return res;
}

This is not an efficient implementation as it calls mkval() on each call (which is simply a waste). We could also likely get rid of a lot of the checking.

Testing

And now for the test program, built on top of the test examples on their GitHub page:

#include <stdio.h>
#include "../elk/elk.c"

char mem1[256];
char mem2[256];
char* src1 =
  "let a = 1;"
  "let b = 2;"
  "a = sum(a, b);"
  "b = sum(b, a);"
  "sum(a, b);";
char* src2 =
  "let a = 1;"
  "while(a < 128){"
    "a = sum(a, a);"
  "}"
  "a;";

/**
 * sum()
 *
 * Sum two integers together and return the result.
 *
 * @param a The first integer.
 * @param b The second integer. 
 * @return The summed integers. 
 **/
int sum(int a, int b){ 
  return a + b;
}

/**
 * main()
 *
 * A test function for checking elk concurrency.
 *
 * @param argv The command line arguments.
 * @param argc The number of command line arguments.
 * @return The exit status of the program.
 **/
int main(char** argv, int argc){
  /* Setup JS environments in the given memory */
  struct js* js1 = js_create(mem1, sizeof(mem1));
  struct js* js2 = js_create(mem2, sizeof(mem2));
  /* Give external C functions */
  js_set(js1, js_glob(js1), "sum", js_import(js1, (uintptr_t)sum, "iii"));
  js_set(js2, js_glob(js2), "sum", js_import(js2, (uintptr_t)sum, "iii"));
  /* Variable to return result */
  jsval_t r1;
  jsval_t r2;
  /* Initialise the code */
  js_init(js1, src1);
  js_init(js2, src2);
  /* Run main loop */
  bool running = true;
  while(running){
    running = false;
    if(js1->tok != TOK_EOF && js1->pos < js1->clen){
      printf("js1: %i\n", js1->pos);
      r1 = js_run(js1);
      running = true;
    }
    if(js2->tok != TOK_EOF && js2->pos < js2->clen){
      printf("js2: %i\n", js2->pos);
      r2 = js_run(js2);
      running = true;
    }
  }
  printf("result1: %s\n", js_str(js1, r1));
  printf("result2: %s\n", js_str(js2, r2));
  return 0;
}

The implementation is pretty simple, it just runs both JS engines until both of them are complete. The sum() function ran in the JS is a native C function, which proves that we can leverage C functions for greater speed, despite just operating in JS.

The programs being run are pretty simple on purpose. For the first engine, we run the following JS:

let a = 1;
let b = 2;
a = sum(a, b);
b = sum(b, a);
sum(a, b);

This program simply does the following:

  1. a=1
  2. b=2
  3. a=a+b=1+2=3
  4. b=b+a=2+3=5
  5. Return: a+b=5+3=8

For the second program, we wanted to do something different (to prove the executions do not interfere with one another) and run for slightly longer:

let a = 1;
while(a < 128){
 a = sum(a, a);
}
a;

This program simply does the following:

  1. a=1
  2. a=a+a=1+1
  3. Repeat the previous step until a128
  4. Return: a128

Results

So now for the big question: Does it work?

$ ./elktest
js1: 0
js2: 0
js1: 10
js2: 10
js1: 20
js2: 10
js1: 34
js2: 10
js1: 48
js2: 10
js2: 10
js2: 10
js2: 10
js2: 10
js2: 40
result1: 8
result2: 128

Yes. We can clearly see that it swaps back and fourth between the two different JS 'programs', finishing the execution of the first program and finally displaying the results once the second program has also finished running.

I am very sure it is not perfect - I'm pretty sure that it will not evenly distribute time across the two different programs, but for an initial test, it does in fact seem to be plausible to 'multitask' on a single core in limited memory - cool!

Going Forwards

Going forwards, I believe this could very much be a viable way forwards. It seems as though each application could in theory run in limited memory and not run the risk of crashing neighbouring applications. Also, given the ability to bind C functions, each application could feasibly leverage the speed of C functions and run just surface level logic in JS.

I will very much be considering using this in a PDA implementation. In theory this could allow for easy syncing between a desktop and microcontroller implementation, running near identical code. It would also allow for the rapid implementation of new apps and programs on the PDA device.

The end of this month going well, I think I will look towards treating myself to such a device as the m5paper (subject to availability). Let's call it a celebration and early Christmas present if that helps.


  1. Unfortunately nothing ever came from our discussions, but I still believe it could be something worth exploring.