Coffee Space


Listen:

128 Bit Computing

Preview Image

Preview Image

On HackerNews, somebody posted this article from 1995! One of the interesting future predictions is for 128 bit computing:

0001 FORECASTS for 64->128-bit transition:
0002 1) If memory density continues to increase at the same rate,
0003 and virtual memory pressure retains the 4:1 ratio, and we think we've just
0004 added 32 more bits, to be consumed 2 bits/3 years, we get:
0005         3*32/2 = 48 years
0006 and I arbitrarily pick 1995 as a year when:
0007         a) There was noticable pressure from some customers for 4GB+
0008         physical memories, and a few people buying more, in "vanilla"
0009         systems.
0010         b) One can expect 4 vendors to be shipping 64-bit chips,
0011         i.e., not a a complete oddity.
0012 Hence, one estimate would be 1995+48 = 2043 to be in leading edge of
0013 64->128-bit transition, based on *physical memory* pressure.
0014 That is: the pressure comes from the wish to conveniently address the
0015 memory that one might actually buy.

I then replied with the following about reasons for why 128 bit computing may see various pressures by 2043:

I think 128 bit computers will come around eventually, despite it having been declared that 64 bit is "enough". Some pressures may come from:

These are just things I've noticed. I imagine there are others too. The prediction of 2043 is still quite realistic, I wouldn't be surprised if we beat it.

I was quite disappointed to see many Linux distros give up on 32 bit support because it was too much effort to support. It probably points towards some crappy code that is highly dependant on the platform.

I'll address a few of these points here...

Hashing

Regarding the point about hashing, after my own experiments the other day, I realised that larger hashes are important fore reducing hash collisions, and larger integers reduce computation to produce such integers. Anyway, I opensourced by experiments so other people can play with this.

Custom Instructions

One exciting thing 128 bit computing could enable are custom instructions. This part was not so suitable for a comment over on HackerNews as it requires a lot of unproven speculation.

Firstly, most mixed-width instructions could then feasibly fit into a single register. This would mean they could be processed more quickly (in theory).

Secondly would be the possibility of being able to write a small in-place VM. You would want the first two instructions to define the custom instruction to be run and the last 6 to act as the VM, such that the CPU looks for the patterns OOxx xxxx xxxx xxxx. The instruction could look like follows:

0016 +--8bit--+--8bit--+-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+
0017 v        v        v      v      v      v      v      v      v
0018  [opcode] [opcode] [byte] [byte] [byte] [byte] [byte] [byte]
0019 
0020     +-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+
0021     v      v      v      v      v      v      v      v      v
0022      [byte] [byte] [byte] [byte] [byte] [byte] [byte] [byte]

You would treat the [byte] bytes in pairs, where the first is the VM operation and the second is data <vmop, data> - but they would be split like follows:

0023 +--8bit--+--8bit--+-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+
0024 v        v        v      v      v      v      v      v      v
0025  [opcode] [opcode] [vmop] [vmop] [vmop] [vmop] [vmop] [vmop]
0026 
0027     +-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+-8bit-+
0028     v      v      v      v      v      v      v      v      v
0029      [vmop] [data] [data] [data] [data] [data] [data] [data]

This means that the resulting data can easily be obtained by accessing the lower register. This would likely be the easiest way to pass parameters into the instruction.

A vmop would itself be split into an instruction and offset. The 4 bit instruction could be one of 16 instructions (note I have not checked if this is really suitable, but it seems like a good start):

Bit pattern Instruction Description
0b0000 hlt Halt VM, continue execution of program
0b0001 jmp Jump to a given offset in any case
0b0010 jz Jump if zero to offset
0b0011 jnz Jump if not zero to offset
0b0100 movi(this) Move this data to offset location
0b0101 mova(that) Move data at offset to this location
0b0110 noti(this) Bitwise NOT on this data, store at offset
0b0111 nota(that) Bitwise NOT on that data, store here
0b1000 ori (this) Bitwise OR on this data, store at offset
0b1001 ora (that) Bitwise OR on that data, store here
0b1010 andi(this) Bitwise AND on this data, store at offset
0b1011 anda(that) Bitwise AND on that data, store here
0b1100 xori(this) Bitwise XOR on this data, store at offset
0b1101 xora(that) Bitwise XOR on that data, store here
0b1110 addi(this) Add this data to offset location
0b1111 adda(that) Add data at offset to this location

A subtraction is just a negative add, a multiplication is just multiple adds, a division is multiple substractions.

NOTE: A really powerful idea here that may be missed is that vmop can be overwritten on the fly, and so can the original opcode, allowing a different VM or special instruction to be run.

WARNING: It may be desired to throw an error flag somewhere if some maximum number of cycles is reached, rather than get stuck in an infinite loop. The error flag is required to allow the program to know that this occured. The error may also be required to allow the program to know that an invalid instructions was requested (i.e. opcode is overwritten with something impossible).

I think the idea is potentially cool, but it would need testing to see if it has any real legs. It's not clear if useful programs can be built in the few available bytes, or that any real speed-up would be gained from using a tiny VM like this. The fact that it can do tonnes of computation without any fetches into RAM, etc, may yield interested results.

If somebody wants to pick this project up, I think the first step would be to build a small assembler and VM to see what kinds of programs could potentially be written within a VM of just 7 instructions (maybe 15 with a long long 256 bit wide register). Some potentially interesting programs could be:

Let me know what you think!