Coffee Space – Coffee Space

Server Update

I was reading an Ask HN the other day about not using Docker (or other) containers on servers. I’ve never run containers on my servers. When I used AWS, I was paying some compute unit, so I kept computation as low as possible. Since then, I have jumped from cheap host to cheap host.

These days I keep the cost of the server under $12 per year, with 128 MB of RAM. Running on resources this low means there is not really enough space for these containment systems, especially as I run more than a few things.

How do you maintain portability? Quite simply, just better testing. When you push some new code, be prepared to have to jump into the code and perform some change. Ideally you test enough that there is a hell of a good chance that it works. Containers should not be used as a method to avoid testing. If the code fails, find out why - don’t just keep making random changes until it does work. Then, once you find the root cause, add it to your check list. Eventually your pre-push testing will be so robust that you will not run into such issues.

How do you maintain security? Containers are not nearly as secure as you are lead to believe. Encrypt everything (even RAM) and trust nothing. Another point is to avoid security at all costs. If you can avoid having to handle user input or security, do so.

Auto-Management

One thing you’ll find you want to do is for the server to auto-fetch and then pull new changes from a Git repository. You’ll then want to build the code for the server (optimize for local hardware) and then run your new code.

The following is an example I use for C/C++, and generally has evolved over the years from various projects as the needs have changed.

0001 #!/bin/bash
0002 
0003 # String for process
0004 PROC="PROGRAM_NAME"
0005 STATS="/tmp/SERVER_STATS"
0006 
0007 # Variables
0008 n_clients="0"

Here we have the variables. PROGRAM_NAME is a unique identifier for your program running on the server. SERVER_STATS are some interesting stats logged out to /tmp/SERVER_STATS about the current server state.

As you can see, in our example we are interested in n_clients (number of connected clients). The SERVER_STATS file will write something like:

0009 n_clients="9"
0010 n_authed="2"

The idea is that this information informs this script as to whether we should perform some action. In our case, we want to wait to restart the server application whilst there are connected clients.

0011 # log()
0012 #
0013 # Log to standard error a script related point of interest.
0014 #
0015 # @param $@ The message to be printed.
0016 function log {
0017   echo "[$(date +%F_%H-%M-%S)] $@" >/dev/stderr
0018 }

Log stuff to the standard error of interest. If the server program suddenly reboots, it is nice to be able to check the logs to find out why.

0019 # read_config()
0020 #
0021 # Read the configuration file, otherwise set default values.
0022 function read_config {
0023   log "[before] n_clients = $n_clients"
0024   if [ -f "$STATS" ]; then
0025     . $STATS
0026   fi
0027   log "[after] n_clients = $n_clients"
0028 }

A function to read our SERVER_STATS file, which we simple source. We log the before and after data as a sanity check if we ever need to debug the reason for a reboot.

0029 # restart_process()
0030 #
0031 # Stop any existing process by the same name and then start a new one.
0032 function restart_process {
0033   # Ensure we create a new stats file
0034   rm $STATS
0035   # If process is running
0036   res="$(ps ax | grep $PROC | grep -v grep)"
0037   if [ ! "${res:-null}" = null ]; then
0038     pid="$(echo $res | awk '{print $1}')"
0039     log "Trying to kill process $pid"
0040     kill $pid
0041   fi
0042   log "Trying to start process $PROC"
0043   bash server.sh &
0044 }

We assume the process name PROGRAM_NAME is unique. We attempt to find it, then kill it. We then restart our server script, in this case called server.sh. We wrap the server program in a small wrapper script simply because we may want to pass new parameters to the process.

NOTE: We purposely delete the SERVER_STATS file and wait for the server program to write a new one. If the server fails to reboot for some reason, we don’t want to fool ourselves with old data.

0045 # rebuild_program
0046 #
0047 # Rebuild the program.
0048 function rebuild_program {
0049   root="$(pwd)"
0050   cd "SOURCE_CODE"
0051     make clean && make install
0052   cd "$root"
0053 }

Perform some rebuilding. In this case it is just a simple Makefile.

0054 # ensure_safe()
0055 #
0056 # Make sure the program is not in active use.
0057 function ensure_safe {
0058   read_config
0059   while [ ! $n_clients -eq "0" ]; do
0060     log "Waiting to rebuild and reboot server, n_clients = $n_clients"
0061     sleep 30
0062     read_config
0063   done
0064 }

Loop and wait until it is safe to do something with the server. Once we hit this function we are simply waiting to perform a restart when the correct conditions are met. In this case, we wait for all of the clients to disconnect.

0065 # Restart process by default
0066 restart_process

The first thing we ever do is reboot the server. We kill any existing processes and then start the server program.

0067 # Infinite loop
0068 while :
0069 do
0070   # Fetch the latest changes
0071   git fetch
0072   # Check whether pull required
0073   if [ $(git rev-parse HEAD) != $(git rev-parse @{u}) ]; then
0074     # Pull the latest changes
0075     git pull
0076     # Rebuild the files
0077     ensure_safe
0078     rebuild_program
0079     # Restart the process
0080     restart_process
0081   else
0082     log "No changes"
0083   fi
0084   # Check if process is running
0085   res="$(ps ax | grep $PROC | grep -v grep)"
0086   if [ "${res:-null}" = null ]; then
0087     # As it's not running, rebuild and restart it
0088     make && restart_process
0089     # Do another loop shortly
0090     log "Failed build detected, quick reboot"
0091     sleep 30
0092   else
0093     # Sleep for 5 minutes and check again
0094     log "Nothing to do, sleeping"
0095     sleep 300
0096   fi
0097 done

This is the main loop. Here we are checking whether there are some changes in git every 5 minutes (300 seconds to be friendly to the Git server). If we detect there are some changes, we pull in the latest code, ensure there are no active users, rebuild and restart.

If not pulling in changes we simple ensure the process is running. If the sever program is not running, we increase our checking loop to every 30 seconds.

This is not foolproof and you should adjust this for your exact needs. Do not blindly use this script without first reading and understanding it.

Use at your own risk.

Improvements

In the future I would make the following improvements:

Unique ID - Each server should generate a unique ID so that they are uniquely addressable.
Better timestamp - Currently the timestamp only has 1 second accuracy, we can do better than this.
Log to disk - Encase of a server crash, it would be great to log this to the disk.
Send log via email - If the process is failing to run or crashing it could be great to send an email to an engineer. In this case it would be great to send the log via email - along with the unique server ID.

Anyway, I will continue to iterate on the design. I do not want to make it more complex - the benefit of this script is it’s incredible simplicity. It’s not smart enough to cause too much trouble. Sure, there are some edge cases, but they are fairly obvious edge cases.