NOTE: This document will be updated when compression algorithm designs are created, or requirements from the operating system change.
Compression is the act of removing redundant information from an information stream, representing it as it's core self. This is a difficult problem, that is well known to have a limit that can be defined by looking at the entropy of the information you wish to compress:
E = log_2 p . p
The following will be assumed about the data, to allow theoretical application be discovered without writing the actual application (for the purpose of sanity checking):
This means that following can also be assumed:
This version involves using a reserved byte for indicating the start and end of a String section. This means that we have to use a byte at the start, but fortunately the end of a String is indicated using a NULL byte. Best case scenario, we would have a large String, so that we get a saving of 25 - ((25 * 7) / 8) = 3
. This is very little saving for just 3 bytes.
Saving: 3 out of 512 bytes
This version will be looking at taking pairs of matches and compressing them into a dictionary of fixed size, leaving an indicator in it's place. We are particularly looking for triple matches - as these are worth replacing. The probability of this occurring twice is likely because of the nature of the language, where some assembly instructions or calls will repeat several times.
A reasonable dictionary size would be:
[<1> Dictionary size][<1> Match byte][<3> Match]..[<3> Match]
The Dictionary size
is the number of Match
entries in the dictionary. Match byte
represents the byte to look for to look for a match. The Match
represents the original three bytes that were replaced, where it's position in the dictionary represents the index number next to the Match byte
in the compressed stream. The following are rules for the Match byte
in the compressed stream:
Match byte
should be the least represented byte in the stream to increase compression ratio.Match byte
can be represented in the compressed stream by using [<1> Match byte][<1> 255]
.Match byte
in the data stream represents the index in the dictionary for the replacement bytes.Assuming that we have three different repeats, where we see 2 repeats of each, we will compress 3 bytes * 3 patterns * 2 repeats = 18 bytes
. From these 18 bytes we should save 18 - ((3 bytes * 3 patterns) + dictionary size + match byte) = 7 bytes
.
Saving: 7 out of 512 bytes
Assuming we will have 16 programs in the final project, each 512 bytes in size - unless we save 512 / 16 = 32 bytes
per program reliably - we won't even save any space with the added decompression program. The algorithms will we be significantly more difficult in assembly to write reliably, the compression program will be difficult for external people to run and there is still the issue of non-fixed size decompressed program sizes.
Unless there are significant savings - this is not worth running.