Coffee Space – Coffee Space

Whois Library

So, you want to use a whois library, eh? This is a little write-up on how a simple quest is actually not so simple…

Background

The whois tool is an old but gold tool written in 1999 as a replacement for other look-up tools. It is used to query various databases in order to find information about IP addresses (and domains). IPv4 blocks can be reallocated over time, and it can be useful to figure out whom a block belongs to, as well as other properties.

You can write something like:

0001 $ whois 8.8.8.8

And find the information regarding the Google DNS IP address.

Due to the implementation of GDPR, an IP address and whom it belongs to is considered to be personal data, and therefore this database can no longer track as accurate information as it used to.

Problem

The official whois repository was never meant to be a C library. Worse still, it does no easily lend itself to this function at all. It defines its own main() function, so includes are not trivial. Functions do not return data, they merely print is straight to stdout.

We could of course write our own implementation, but why duplicate efforts? What if for example somebody decides to rewrite how the RIPE database is interfaced with? Do we really want to burden ourselves with writing in this new implementation? The answer is likely ‘no’.

As many command line tools parse the output, likely the most reliable thing about the implementation is actually that the string print format itself does not change, but instead just the code. So it’s almost better in this case to wrap the tool itself.

Other implementations address this by spawning a new process and then parsing the output. This works, but does suffer a few issues:

Not cross-platform - The whois implementations can be different from platform to platform, even within Unix environment itself.
Security risk - Spawning a process is literally a shell, with a badly formatted string you are potentially giving a hacker the ability to access the user space at whichever privilege the process was spawned with.
Resources - Every time you load a whois binary, you need to load it from risk, it loads the everything internally, it then performs the fetch and then prints the data you want to look at.

Maybe each of these problems can be addressed, but you see this is beginning to be a far from trivial process now.

Solution

The directory structure looking something like this:

0002 ./
0003   whois/ # Git submodule
0004     # Relevant files
0005   main.c
0006   whois.h

The whois submodule is simply the official repository, that can be added to a Git project with something like:

0007 git submodule add https://github.com/rfc1036/whois.git

Note: The use of HTTPS and not SSH means that users of your repository do not need a GitHub account in order to add the submodules for your project.

So firstly, we define whois.h:

0008 #define socklen_t int
0009 #define ripeflags _ripeflags
0010 #define ripeflagsp _ripeflagsp
0011 
0012 #include "whois/whois.h"

The purpose here is to offer just about enough to allow our main program to build. We later link to the build as appropriate.

Next, we define our build process:

0013 all : test
0014 
0015 test : main.c whois_build
0016     $(CC) -g -c main.c
0017     $(CC) -g -o test main.o whois/whois.o whois/utils.o
0018 
0019 clean : whois_clean
0020     -rm test
0021 
0022 whois_build :
0023     cd whois; $(MAKE); rm whois.o
0024     cd whois; $(CC) -g -Dmain=_main -c ../whois.h whois.c utils.c
0025 
0026 whois_clean :
0027     cd whois; $(MAKE) clean

A few things to note here:

The use of -g is just for debug symbols, which are useful during experiments. Feel free to remove them as you like!
We make use of the whois submodule’s Makefile to build the majority of the code for us. There is no point duplicating efforts here.
We then manually recompile whois.o without the main() function (it’s redefined to _main()), to prevent a redefinition of the main() function when linking in the test binary.
We then link everything together into test.

Note: It is important for main.c to have a proper main() function if you intend to use the standard libraries. Initially I was getting SEGFAULT issues after calling printf(), until I realised the standard C library had not been initialised, due to the fact we were using a different entry function. This is because in reality, main() is not the first function to be called, but instead a series of initialisation functions are called.

And now for the main program that uses the ‘library’:

0028 #include "whois.h"
0029 
0030 #include <stddef.h>
0031 #include <stdio.h>
0032 #include <stdlib.h>
0033 
0034 int main(int argc, char** argv){
0035   char* server = "whois.ripe.net";
0036   char* flags = "";
0037   char* query = "69.63.181.11";
0038   char* qstring = queryformat(server, flags, query);
0039   int sockfd;
0040   char* response;
0041   /* Capture output of stdout, see: https://stackoverflow.com/a/35249468/2847743 */
0042   int _stdout;
0043   _stdout = dup(fileno(stdout));
0044   int pipefd[2];
0045   pipe2(pipefd, 0);
0046   dup2(pipefd[1], fileno(stdout));
0047   /* Query server */
0048   sockfd = openconn(server, NULL);
0049   response = do_query(sockfd, query);
0050   /* Grab the response of the function */
0051   fflush(stdout);
0052   close(pipefd[1]);
0053   dup2(_stdout, fileno(stdout));
0054   char buf[4096 + 1];
0055   read(pipefd[0], buf, 4096);
0056   printf("reply ->\n```\n%s\n```\n", buf);
0057   return 0;
0058 }

As you can see, there is some weirdness about capturing the output from the function do_query(). The reason for this is that the function prints to stdout, something that cannot be changed without either rewriting the function or reimplementing it. The entire purpose here is that we use the standard unedited implementation of whois.

For that reason, we pipe stdout to a new location, and then read it back after the completion of the function, where we can later process it.

In theory, what we now have is a way to arbitrarily wrap almost any random C program as a C library.

Further Work

whois.h should offer some form of parsing on behalf of the implementer to save time. Perhaps a function like parse_output() that returns a struct of data it was able to find and parse.

Additionally, it would be kind to the servers to offer caching of requests (something that could be switched on and off). This is especially required if you are running tonnes of queries.

It would also be great to see the wrapper offer a non-blocking function, where the caller has the ability to query whether the WHOIS query was performed yet. It should be possible to do this without threading if thought about carefully.