Coffee Space – Coffee Space

Server Performance

Introduction

I recently read an article by Mark McNally, where he discusses running a website that can theoretically handle 4.2 million requests per day for just £4 per month. He achieves this with 1 CPU and 2GB of RAM. This of course got him to the front page of HackerNews and all the attention that brings with it ¹.

The tool he introduces for testing, ApacheBench is new to me, but also pre-installed on my machine already - awesome! To run a test, you can run:

0001 ab -n <NUM_TESTS> -c <CONCURRENT_TESTS> <DOMAIN>

There are of course caveats in the tests that Mark performs, some of which are also addressed and some that are now:

Linear scale assumption - Unfortunately, no VPS resources will scale linearly as the number of concurrent requests increases. More requests means more networking traffic, larger buffers, more RAM usage, more CPU usage, etc, etc. One of these things will definitely be consumed as traffic increases.
Different pages - Different pages are of different sizes, are stored on different parts of the disk, are cached to different levels, require different amounts of CPU for compression and/or encryption, etc.
Resource loading - Web pages are more than just HTML, you have JS, CSS, images, videos, etc.

That all said, it beats zero testing! And it at least allows us to compare changes to the website in some meaningful way, even if it means we cannot accurately guess how much traffic we can handle - we can at least say with reasonable confidence it is more or less than some previous state.

Test Setup

Instead of testing some selected web pages, I want to test all the different web pages automatically. I therefore wrote a script for automating the data collection:

0002 #!/bin/bash
0003 
0004 DOMAIN="https://coffeespace.org.uk/"
0005 RESULTS="perf-$(date +"%Y-%m-%d_%H-%M-%S").csv"
0006 
0007 # main()
0008 #
0009 # The main entry function.
0010 #
0011 # @param $@ The command line parameters.
0012 main() {
0013   # Loop over the data
0014   for f in www/*.html www/**/*.html www/**/**/*.html; do
0015     echo "url -> $DOMAIN$f"
0016     ab -n 1000 -c 100 -e temp.csv "$DOMAIN$f"
0017     echo "$(tail -n +2 temp.csv)" >> $RESULTS
0018     rm temp.csv
0019   done
0020 }
0021 
0022 main $@

Note that I chose the same values for number of tests and number of concurrent connections as Mark did. Checking my network usage, I see about 650kB/s down and 150kB/s up.

Next I import the results into LibreOffice and generate some graphs…

Results

A total of 211k tests were performed against the server against all .html pages in the statically generated part. Most pages were delivered in about 9ms. If we round this up to 12ms, the server could serve in excess of 80 pages per second, or about 7 million pages per day!

As I mentioned previously - this server has just 128MB of RAM for $1.50 per month. Suck it Mark!

But if we look at the data, things are not quite so simple. As we keep hitting the server, the time to return a web page actually increases ². It’s likely that resources are being consumed on the server (such as RAM) and it is struggling to keep up ³.

It’s good to know that the server could handle about 100 requests per second and still stand up though.

Conclusions

Testing Improvements

Currently the testing is not so ideal. There are a bunch of variables I cannot currently account for, such as:

Active users whilst testing - There may be people actively reading from the server, which of course would skew the data.
Current server activity - The server is clearly share with other users at this price point, so dependent on what others are doing, I might be sharing significant CPU, network or RAM bandwidth with them.
Test machine network - The machine I’m testing from and the network too and from massively make a difference. I can’t for example test with more than 100 concurrent connections without saturating my personal network.

Ideally we need better tools for more consistent tests, but ab is surely better than nothing.

Website Improvements

Perhaps in the future I could look at implementing disk caching of some kind, where content is stored in a RAM disk (on a better server). Currently caching would not really speed up the statically served content.

I could also look to minify the resources, such as the CSS and JS. Currently they are entirely human readable.

One thing I would like to look towards is providing a future-proof Data URL that encodes the meat of the web page as a link. This would be instantly loadable and always available offline. There is a generally accepted length limit of 65k bytes, which could be overcome by putting the text through compression, such as a ZIP file.

Note to self: I need to stop performing tests, I’ve already used 11% (13GB) of the servers 100GB monthly bandwidth - ouch!

The comments are well worth a read.↩︎
This could also be my own home-grown potato internet struggling.↩︎
I would add a RAM plot but the server RAM is constantly absolutely brimmed.↩︎