Coffee Space – Coffee Space

Programming Concepts

The following are just some ideas I’ve not had time to flesh out as of yet, but could be interesting to experiment with when I get some time.

Some of these ideas are cool, some of them are borderline crazy - make of them what you will.

`make_cached`

In C++, you have the idea of std::make_shared, which achieves:

Allocates and constructs an object of type T passing args to its constructor, and returns an object of type shared_ptr<T> that owns and stores a pointer to it (with a use count of 1).

An example use of this could be:

0001 std::shared_ptr<int> bar = std::make_shared<int>(69);
0002 std::cout << "*bar -> " << *bar << "\n";

Instead I would have something like:

0003 std::function<bool ()> check = [](){ return true; };
0004 std::function<int ()> update = [](){ return 0; };
0005 std::cached_ptr<int> bar = std::make_cached<int>(69, check, update);
0006 std::cout << "*bar -> " << *bar << "\n";

Looks similar, but this time the lambda functions do some additional operations. When retrieving the value, check() is called to see whether the information held is still valid. If true then the current value is returned, otherwise if false then the update() function is run in order to retrieve a new value.

The idea here is that check() could check all numbers of things. It could test the difference in time since a last update occurred, whether a file on disk was updated, or any other number of other observations. The update() function could then be performed to do any number of tasks itself, including but not limited to querying a database, retrieving data from the network, etc.

Generally the checks would only be performed on retrieving the value, so that you only update the cache of values you use regularly. If this was used in conjunction std::weak_ptr and some form of garbage collection, this could even contribute to memory reduction without much programmer interference at a relatively small performance cost.

Add some thread safety to ensure nobody reads during an update() and you have quite an interesting data structure. In theory you could load near-infinite data into memory and only pay for what you use, as long as you have a way to retrieve data that was pushed out of cache.

Self-Modifying Code

I saw some post a while back (lost the link) about the possibility of self-modifying code in the Linux kernel. Apparently it is technically possible, but nobody is really doing this. Some potential benefits include:

Order of computation - Say you have a long set of if-statements, you might want to check the most likely case first as observed during operation.
High-use functions - Potentially you could put a set of high-use functions into the same block of memory, so that they can all sit in cache on the CPU.
High-use memory - Like previously mentioned, you could align some high-use memory into the same block so that they can sit together in the cache. Alternatively, you may choose to separate data that is used on different cores (looking at you AMD and your infinity architecture ¹).

Perhaps the easiest way to achieve this would be to recompile the source on the fly - but this would require a compiler to be available locally. On the plus side, it would handle the issues of memory offsets, etc - and if it compiles, there’s a good chance it should be okay.

You would likely need to define code in a function that can be swapped, something like:

0007 void foo(){
0008   # REORDER_ME
0009   if(str == "test"){
0010     /* Do something */
0011   }
0012   # REORDER_ME
0013   if(str == "another test"){
0014     /* Do something */
0015   }
0016   # REORDER_ME
0017   if(str == "even more test"){
0018     /* Do something */
0019   }
0020   /* Etc */
0021 }

Probabilistic Computation

Sometimes you don’t care that the result of a computation or the data stored in memory is exactly as it should be. In gcc for example you can already set --ffast-math and various other flags that give up precision and preciseness checks. I’ve used this for example in CPU-based neural networks.

One could imagine declaring something like:

0022 ~int a = 60;
0023 ~int b = 9;
0024 ~int res = a + b; // 69

Where ~int indicates ‘approximately integer’. The compiler for example might see this (ignoring pre-computation optimisations) and instead perform:

0025 int a = 60;
0026 int b = 9;
0027 int res = a | b; // 61

Obviously you would only do this if allowed to do it, and only do it if it yielded some performance benefit. The use-case for this might be to quickly compare some numbers, where you only care if you are probabilistically picking the largest number. Another use-case could be in the early stages of training a neural network, where you are purposefully introducing large amounts of noise anyway.

Memory Decay

The idea here would be to allow memory to become decayed (changed unexpectedly) through various means and for various reasons. The main reason why memory decay could be useful could be to compress the data. You have have the following string:

0028 char* msg = "Eat meat!";
0029 mem_decay(msg);
0030 std::cout << msg << "\n"; // Out: "eat meat!"

In this case the ‘decay’ has made the data more easily compressed by having duplicate substrings ‘eat’. A more useful example might be in purposefully adjusting an image in order to get a better compression ratio.

Force Malloc

Note: There’s a good chance this one already exists.

One thing that can get people unstuck is the kernel only allocating memory at the time it is accessed, not at the time it is requested. It would be nice to have a function that forced the kernel to actually allocate the memory rather than pretend to with little to no CPU overhead.

A crappy solution would be to manually access each part:

0031 char* data = malloc(SIZE);
0032 for(int x = 0; x < SIZE; x++){
0033   data[x] = 0;
0034 }

But it would be cool if we could get the kernel to actually just allocate the requested memory at the time of the request.

`future_result()`

This one is truly hypothetical and likely completely impossible, but I can’t quite shake the idea of a function that computes data before it is requested. It’s not really well defined, but I wanted to write it down anyway.

Maybe it could look something like this:

0035 int num = get_user_number();
0036 future_compute(num); // Compiler inserted
0037 /* Perform some actions */
0038 if(rand(1000) == 0){
0039   std::cout << "result -> " << future_result(num) << "\n";
0040 }
0041 future_terminate(); // Compiler inserted

In this case, the num value depends on something the user types and cannot be predicted in advance. If we can make the assumption that whatever is computed relies fully upon the parameters and is time independent, but is time expensive, we can begin to compute as soon as the compiler has the parameters. We can do this in the background on a different thread using future_compute() and then join with that thread to get the result at future_result().

After getting the result we can choose to perform future_terminate(), or perhaps we continue to compute the result even after it was not used and cache it somewhere encase it is requested in the future.

AMD’s infinity fabric is the future of server/desktop processors, it is the only way to mitigate silicon production risk, especially as we look towards putting more and more cores into a single package as Moore’s law hits a wall.↩︎