Coffee Space


Listen:

C vs Java Large Arrays

Preview Image

NOTE: Updated 01/12/2016.

Introduction

I wanted to do a speed test to see how fast it is to process a large array for an idea I had for a neural network that is updated by some kernel method. I love and am very comfortable with Java, but am not completely against the idea of using another language if there is some benefit to doing so.

A language that is often pouted as being very fast is of course C - the reason why it is used in the Linux Kernel and appears in many other high end places. I decided to do a speed test in order to figure out whether the change would be worth it, so I devised a test to find out how much faster C would be that Java for processing large arrays.

There are some “gotchas” with this processing idea for the array - it has to be processed from the top to the bottom of the 2D array due to how the neurons speak to one another.

0001  ___ ___ ___
0002 | L | C | R |
0003 |___|___|___|
0004 |   | N |   |
0005 |___|___|___|
0006 |   |   |   |
0007 |___|___|___|

For the structure, N is equal to a summation of A, B and C - therefore they need to be processed first. This means that there is a limitation as to how parallelised the system can be made and care has to be taken in the threads, as will be seen in the below “Code” section.

Code

For the first two examples, hopefully you will see that the code is very much similar in many respects and performance wise they have been given equal opportunity.

Also worth noting is that the 2 dimensional array has been unravelled into a 1 dimensional array for the purpose of speed. As the array is pretty large (~1 billion bytes) it makes a different.

C

0008 #include <stdio.h>
0009 #include <time.h>
0010 
0011 #define size 30000
0012 #define total (size * size)
0013 #define cycles 100
0014 
0015 char* grid;
0016 clock_t start;
0017 clock_t end;
0018 
0019 int main(int argc, char** argv){
0020   printf("\nBillion Neuron Speed Test\n");
0021   printf("\nTotal neurons -> %i", total);
0022   printf("\nPer side -> %i", size);
0023   printf("\nCycles to test -> %i", cycles);
0024   printf("\nInitialising variable grid...");
0025   grid = (char*)malloc(sizeof(char) * total);
0026   printf("\nGrid initialised!");
0027   printf("\nExecuting speed test...");
0028   start = clock();
0029   int z;
0030   int x;
0031   for(z = 0; z < cycles; z++){
0032     for(x = 0; x < total; x++){
0033       grid[x] = x % 64;
0034     }
0035   }
0036   end = clock();
0037   printf("\nEnd of speed test!");
0038   printf("\nTest took %d ms", ((end - start) / 1000));
0039   printf("\n%f cycles per sec",  ((double)(cycles * 1000000) / (double)(end - start)));
0040 }

Java

0041 public class Speed{
0042   private static final int size = 30000;
0043   private static final int total = size * size;
0044   private static final int cycles = 100;
0045   private static byte[] grid;
0046   private static long start;
0047   private static long end;
0048 
0049   public static void main(String[] args){
0050     System.out.println("Billion Neuron Speed Test\n");
0051     System.out.println("Total neurons -> " + total);
0052     System.out.println("Per side -> " + size);
0053     System.out.println("Cycles to test -> " + cycles);
0054     System.out.println("Initialising variable grid...");
0055     grid = new byte[total];
0056     System.out.println("Grid initialised!");
0057     System.out.println("Executing speed test...");
0058     start = System.currentTimeMillis();
0059     int z;
0060     int x;
0061     for(z = 0; z < cycles; z++){
0062       for(x = 0; x < total; x++){
0063         grid[x] = (byte)(x % 64);
0064       }
0065     }
0066     end = System.currentTimeMillis();
0067     System.out.println("End of speed test!");
0068     System.out.println("Test took " + (end - start) + "ms");
0069     System.out.println(((double)(cycles * 1000) / (double)(end - start)) + " cycles per sec");
0070   }
0071 }

Java with Threads

0072 import java.util.concurrent.atomic.AtomicInteger;
0073 
0074 public class Speed2 extends Thread{
0075   private static final int numThreads = 3;
0076   private static final int size = 30000;
0077   private static final int total = size * size;
0078   private static final int cycles = 100;
0079   private static AtomicInteger next = new AtomicInteger(0);
0080   private static int cDone = numThreads;
0081   private static AtomicInteger done = new AtomicInteger(0);
0082   private static byte[] grid;
0083   private static long start;
0084   private static long end;
0085 
0086   public static void main(String[] args){
0087     System.out.println("Billion Neuron Speed Test\n");
0088     System.out.println("Total neurons -> " + total);
0089     System.out.println("Per side -> " + size);
0090     System.out.println("Cycles to test -> " + cycles);
0091     System.out.println("Initialising variable grid...");
0092     grid = new byte[total];
0093     System.out.println("Grid initialised!");
0094     System.out.println("Executing speed test...");
0095     start = System.currentTimeMillis();
0096     /* NOTE: Size must be a multiple of what is being processed! */
0097     int div = size / numThreads;
0098     Thread[] threads = new Thread[numThreads];
0099     for(int x = 0; x < numThreads; x++){
0100       threads[x] = new Speed2(div * x, div * (x + 1), 0, size);
0101       threads[x].start();
0102     }
0103     while(done.get() < (numThreads * cycles)){
0104       /* Process the threads */
0105       if(done.get() == cDone){
0106         /* Store future exit cycles */
0107         cDone = done.get() + numThreads;
0108         /* As we have all threads waiting, set the count to zero */
0109         next.set(0);
0110         /* Unfreeze all threads */
0111         for(Thread t : threads){
0112           t.interrupt();
0113         }
0114       }
0115     }
0116     end = System.currentTimeMillis();
0117     System.out.println("End of speed test!");
0118     System.out.println("Test took " + (end - start) + "ms");
0119     System.out.println(((double)(cycles * 1000) / (double)(end - start)) + " cycles per sec");
0120   }
0121 
0122   private int startX;
0123   private int endX;
0124   private int startY;
0125   private int endY;
0126 
0127   public Speed2(int startX, int endX, int startY, int endY){
0128     this.startX = startX;
0129     this.endX = endX;
0130     this.startY = startY;
0131     this.endY = endY;
0132   }
0133 
0134   public void run(){
0135     int c;
0136     int x;
0137     int y;
0138     int z;
0139     for(c = 0; c < cycles; c++){
0140       for(y = startY; y < endY; y++){
0141         z = y * size;
0142         for(x = startX; x < endX; x++){
0143           grid[z + x] = (byte)(x % 64);
0144         }
0145         /* Update next counter */
0146         next.incrementAndGet();
0147         /* Wait for other threads to finish */
0148         while((next.get() / numThreads) <= y){
0149           /* Wait in small loop */
0150         }
0151       }
0152       /* Let main loop we are done */
0153       done.incrementAndGet();
0154       /* Allow main thread to perform clean-up service */
0155       try{
0156         /* This thread should not be asleep that long! */
0157         Thread.sleep(1000);
0158       }catch(InterruptedException e){
0159         /* Do nothing */
0160       }
0161     }
0162   }
0163 }

Results

C

0164 $ gcc Speed.c
0165 $ ./a.out
0166 Billion Neuron Speed Test
0167 
0168 Total neurons -> 900000000
0169 Per side -> 30000
0170 Cycles to test -> 100
0171 Initialising variable grid...
0172 Grid initialised!
0173 Executing speed test...
0174 End of speed test!
0175 Test took 281309 ms
0176 0.355481 cycles per sec

Java

0177 $ javac Speed.java
0178 $ java Speed
0179 Billion Neuron Speed Test
0180 
0181 Total neurons -> 900000000
0182 Per side -> 30000
0183 Cycles to test -> 100
0184 Initialising variable grid...
0185 Grid initialised!
0186 Executing speed test...
0187 End of speed test!
0188 Test took 135388ms
0189 0.7386178981889089 cycles per sec

Java with Threads

0190 $ javac Speed2.java
0191 $ java Speed2
0192 Billion Neuron Speed Test
0193 
0194 Total neurons -> 900000000
0195 Per side -> 30000
0196 Cycles to test -> 100
0197 Initialising variable grid...
0198 Grid initialised!
0199 Executing speed test...
0200 End of speed test!
0201 Test took 228783ms
0202 0.43709541355782555 cycles per sec

[Update] C (optimised compile -O3)

0203 $ gcc -O3 Speed.c
0204 $ ./a.out
0205 Billion Neuron Speed Test
0206 
0207 Total neurons -> 900000000
0208 Per side -> 30000
0209 Cycles to test -> 100
0210 Initialising variable grid...
0211 Grid initialised!
0212 Executing speed test...
0213 End of speed test!
0214 Test took 73882 ms
0215 1.353493 cycles per sec

Conclusion

Well… I knew the JIT is pretty cool, but I was not expecting a single Threaded Java application to out perform a C application (NOTE: Before optimising the source code.) of near equal code at all. This is very unexpected. I hope this is not because Java is taking some shortcuts as it knows the byte arrays are never read from - this was the purpose of setting the array to be some function as opposed to a pre-determined value such as zero (which Java may have skipped over or done something cool).

I did expect the multi-threaded Java code to be worse that the single thread, as you always pay the cost of the worst thread, meaning if any of them are paused to execute different tasks the whole program takes the performance hit. For even larger tasks this code may work much more efficiently.