Coffee Space


Listen:

Virtual Repository Environment

Preview Image

Here we discuss the learning points from creating the prototype NHS VRE system - mostly being complaints about how Java and the internet works.

Reading From Network Stream

Implementing a very basic POST system, I found out some nice things about the way Java handles streams - in particular network streams. When writing a web server for scratch, there’s normally no way around having to deal with them at some point to get the performance out of Java.

To start with, we read in the first chunk to figure out what sort of request we are dealing with - GET or POST (HEAD is not implemented on this server). In some more code about this exert, we figure out the positions of everything we are interested in, such as where the binary starts, how long the binary is, what the markers are for the binary, etc.

In this state, we start with the following to make sure it’s worth going into a heavy read mode:

0001 /* Write the file */
0002 int rSize = r;
0003 filesize -= r;
0004 /* Make sure end is not in data to be written */
0005 if(filesize > readSize){

Next we begin read into the file as fast as we can:

0006   os.write(request, startFileBin, r - startFileBin);
0007   rSize = filesize > readSize ? readSize : filesize;
0008   r = is.read(request, 0, rSize);
0009   filesize -= r;
0010   while(filesize > readSize){
0011     os.write(request, 0, r);
0012     rSize = filesize > readSize ? readSize : filesize;
0013     r = is.read(request, 0, rSize);
0014     filesize -= r;
0015   }

Note that r can read 0, meaning no bytes read - yet there are still bytes to be read from the stream! Worse yet, we can read -1 to indicate “end of stream” - but still not have collected all the bytes from the stream as the documents would suggest. This can be due to the client having a slow hard disk, the network being slow or a multitude of reasons.

For this reason, we do a very careful read of the end:

0016   byte[] oRequest = new byte[readSize * 4];
0017   System.arraycopy(request, 0, oRequest, 0, r);
0018   int oReqLen = r;
0019   filesize = (readSize * 4) - oReqLen;
0020   try{
0021     while(filesize > 0 && r >= 0){
0022       rSize = filesize > readSize ? readSize : filesize;
0023       r = is.read(oRequest, oReqLen, rSize);
0024       oReqLen += r;
0025       filesize -= r;
0026     }
0027   }catch(SocketTimeoutException e){
0028     r = e.bytesTransferred;
0029     oReqLen += r;
0030   }

Whatever we are able to read from the stream, we hunt for an ending to the file using a KMPMatch through the byte stream.

0031   int endFileBin = KMPMatch.indexOf(oRequest, bound.getBytes());
0032   for(int x = endFileBin - 1; x >= 0; x--){
0033     if(oRequest[x] == '\r'){
0034       endFileBin = x;
0035       break;
0036     }
0037   }

Saving something is better than saving nothing. Besides, it’s better to save what we can than not to save anything at all - although this has never yet happened.

0038   /* If we fail to find the end, spit out what we have */
0039   if(endFileBin < 0){
0040     /* The best end we have */
0041     endFileBin = oReqLen;
0042   }
0043   os.write(oRequest, 0, endFileBin);
0044   System.out.println("File written."); // TODO: Remove me.

If the file was neevr that big, no need to load anything other than what we already have.

0045 }else{
0046   /* Only a small file - only process current request */
0047   int endFileBin = KMPMatch.indexOf(request, startFileBin, bound.getBytes());
0048   for(int x = endFileBin - 1; x >= 0; x--){
0049     if(request[x] == '\r'){
0050       endFileBin = x;
0051       break;
0052     }
0053   }
0054   /* If we fail to find the end, spit out what we have */
0055   if(endFileBin < 0){
0056     endFileBin = r;
0057   }
0058   os.write(request, startFileBin, endFileBin - startFileBin);
0059 }

Browsers Uploading Files

It appears that browsers are limited to uploading a limit of 2GB, at least 32 bit and 64 bit Firefox. This will become more of a problem in the future when files are larger and larger due to the ever more demanding users of our systems.

Conclusion

This write up is far from complete, with only a fraction of the problems discussed here. Hopefully this provides some insight into the difficulties and problems associated with writing a HTTPS web server from scratch to handle GET, POST and a database securely.

Who knows what the future holds?