Coffee Space


Listen:

JSON Ramble

Preview Image

Preview Image

Disclaimer: I've had a few drinks - I'm angry, upset and annoyed. This is me just offloading some random ideas that I'll likely regret in the morning!

Let me start with a simple statement: I like JSON. I've been using it now for perhaps coming up to 10 years, since it was first introduced to be as part of a robotics project my a good mentor I was lucky to have. Back then (and likely still now), he was big on JavaScript and NodeJS - two things I don't share a passion with.

JSON has served me well over the years, I've used it for some large projects, including another robotics team and my PhD. I've even been foolish enough to even write my own JSON parser in two different languages 1. I've come to quite like it.

Generally I am a fan of a few things:

  1. C-style formatting - I love me some curly brackets for scoping, square brackets for arrays, speech marks for strings, etc. Whilst I know many new people may find it annoying to have to remember to put end-brackets in, as a relatively seasoned programmer I appreciate structure and consistency. (None of this 'spaced-indentation for scope' rubbish, I'm looking at you Python.)
  2. Array support - These days I think it is absolutely required for a configuration file to support arrays. I use them exceptionally often. What on earth we did before arrays is actually beyond me. A properties file for example is so basic in comparison.
  3. Massive support - These days pretty much every language can import a JSON configuration file.
  4. Better than alternatives - Compared to things like properties files, YAML, etc, JSON is pretty good. The syntax sugar is minimal. yet easy to understand. People sometimes complain there is too much - yet, even somebody that has never seen JSON before can figure it out (as I have shown a few times).

One thing I also like (but don't use often) is templating. For example, you define something like the following:

{
  "key": { "type": "int", "min": "0", "max": "100", "default": "50" }
}

Then, when you are parsing your configuration, you have an easy way to check the values are valid and even sane defaults if they are not. Your JSON configuration can then fail-safe. With autonomous robotics, sometimes you want to be able to change configuration as the robot is up and running - and sometimes you accidentally set a bad or insane value. The last thing you want is a powerful humanoid robot trying to kill itself or you!

Now for the next part...

Complaints

JSON, I like you and all, but we have some things to talk about.

Comments

No comment. Literally. There is zero way to explain why something has been assigned a value as it has been. This is one of the major drawbacks of JSON as a configuration language. In properties files you might do something like:

# This is set like this for reasons
key=value

The most obvious thing to do would have been to use C-style comments in my opinion. I would have literally have gone to:

{
  /* This is set like this for reasons */
  "key": "value"
}

With /* indicating the start of a comment and */ indicating the end of a comment. I would avoid // for comments as they are dependant on line endings, and for a parser things could get a little complicated with Windows/Unix line endings for example. This is also the same approach as per CSS. It's my opinion that support for that type of comment structure would be a mistake.

Redundancy

It's possible to set strings (at least in some parsers) as:

{
  "key": "value",
  'another-key': 'another-value'
}

Mixing and matching " and ' is definitely a mistake. Given that visually ' and backtick can be easily confused, I think it would be best to just use ". This is technically how it's supposed to be, but it isn't.

But then if reducing confusion is the goal, JSON is by default UTF-8. Sounds all good and well, but consider that the unicode character or \uXXX are both valid. So you may or may not need to decode the string depending on your application. I believe by default it should be ASCII with all unicode pre-escaped.

Data Types

Numbers in JSON can literally be infinite in size - there are no limits at all. Each library implements their own arbitrary numerical parsing. Depending on the library, this may or may not convert to something useful - and may or may not throw an error of some type:

{
  "key": 58962345984235890432756982347652735347594375624938759483574398572349573454398257023495704893
}

As there is no consensus, personally I would just suggest that everything is a string, and let the implementer parse their own data types. I can imagine that some people may want to put letters around their numbers to indicate their type too, such as:

{
  "binary": "b1100",
  /* Common format for addresses */
  "hex": "0xC",
  /* Common format for general hex */
  "also-hex": "Ch",
  /* Common format for hex colours */
  "again-hex": "#C",
  "byte": "12b",
  "integer": "12",
  "long": "12l",
  "float": "12.0f"
}

As you can see, there are literally tonnes of values - and some will be right for your application, some will not. Parsers should leave this to the implementer and only offer helper functions.

This is especially true when it comes to support for exponents using E and e characters in the numbers. Most people aren't going to be using this functionality and it isn't obvious how it should be supported.

And if it wasn't complex enough already, numbers support signing. Not just negative signing with -, but also positive signing +. Technically that also includes zero too, which can lead to awkward things like this:

{
  "num-1": "+0",
  "num-2": "-0"
}

Some languages implement numbers such that +00 - ouch. Depending on your application, this may or may not be a bug. Do you want your parser to leave this in or not? Personally I recommend dropping + altogether and leaving the zero case to the implementer. Some people may even want !

Okay, okay, that must be it? Nope. Booleans. Depending on the parser, all of these could or could not be 'true':

{
  "a": true,
  "b": True,
  "c": 1,
  "d": "true",
  "e": "True",
  "f": 3426435,
  "g": "random text"
}

Clearly we have a problem here. Again it makes the most sense to leave this problem to the user of the parser with helper function, with everything as a string. Data types are simple not universal.

Versioning

I believe in general it was arrogant to not offer some versioning, especially with so many 'arbitrary' implementation details. I believe the perfect place for this would have been at the beginning, like this:

"cool-json-version": {
  "key": "value"
}

Failing that, if comments are supported you could even have:

/* cool-json-version */
{
  "key": "value"
}

General

And now for some general points about implementations:

  1. Stop crashing - Parser implementations crash. All. The. Time. That's the last thing you want it to do. You want your parser to be really hard to crash. Sure, it can indicate errors, but if possible you want it to recover or at least have the ability to carry on somehow.
  2. Unhelpful errors - When your parser does crash, it offers a really terrible error. It doesn't explain where the bracket is missing, or which bracket it expects to see another for.
  3. Return blank - If you try to access a value that isn't there, just return blank "" or null. There is no need to crash the program because a value could not be found. Sure there is exception handling, but this just creates more code that you really want to access a simple configuration file.

Final

Rant over.

I still like JSON and I'm not recommending yet another standard. I don't believe I could do better. But for the most part, I will personally be using a subset of it and encourage others to do the same. It's strings only for me.


  1. This parser is terribly out of date and buggy, so please don't use the current version! If there is interest I will invest some time in writing another with a clearer mindset!