Coffee Space – Coffee Space

Genetic Programming

Laying beside me is A Field Guide to Genetic Programming by Poli, Langdon, McPhee, et al. If you search the title online, it can be read for free.

My first introduction to evolutionary algorithms was genetic algorithms (GAs), some 10+ years ago in an AI class, I believe held by Michael L Walters (a.k.a Mick). As soon as I heard the explanation, an enormous light bulb went off in my brain. I had the knowledge to make a machine learn. I went home that night and coded it for myself. Very cool.

I was also aware of genetic programming (GP), but I largely dismissed it due to moving onto neural networks. Genetic programming is essentially GA, but the thing being evolved is code. The only thing that really changes is the structure being operated on, otherwise most concepts are similar.

I had this vague feeling in my head that GA/GP was far more powerful and worthy of investigation, but over time it was dismissed. I remember being at an AI conference in New Zealand laughing along in my head with some guys dismissing some work submitted on GAs - and I regret that.

Background

Both neural networks and GP take some input $x$ , and produce some output $y$ , using some learned model M. We can consider this to be $M(x) = y$ . There is then some fitness/error function that is used to adjust the model $M$ to better output the result.

The likes of LLMs are essentially just predictive models, where given some current state $s_t$ , predict a future state $s_{t+1}$ , i.e $M(s_t) = s_{t+1}$ . An agent can explore a predictive model by proposing an action $a_t$ , i.e. $M(s_t | a_t) = s_{t+1}$ . Some more advanced agents may consider the result of following several future actions $A_t$ , i.e. $M(s_t | A_t) = s_{t+1}$ - the computation explodes for large action sequences.

This compute required can be extremely large for training and recall, so before this really became feasible on better computers, researchers developed reinforcement learning. The major idea was not to predict a future state $s_{t+1}$ , but instead the best future action $a_{t+1}$ , i.e. $M(s_t) = a_{t+1}$ .

Structure

A large breakthrough in deep learning was figuring out how to build these deep structures. AlexNet arguably opened the door to these deeper structures, but it took a while to get there.

Given how important work on structures has been, it seems crazy that we do not lean into GP for designing and building these structures. Given how large these models are, what we need is either a representative simplified/reduced model, or a reduced computation training/evaluation.

I believe it is worth noting that whilst GP can be used for the structure, that learning for weights, activation energies, etc, are still far more efficiently addressed by the current learning techniques.

This could be an interesting place for future work…

Intrinsically Motivated GP Agents

What I want to revisit is a GP-based agent where a fitness/error function is not strongly defined. This would be a model in the form $M(s_t | A_t) = s_{t+1}$ where an algorithm searches over the predictive model and produces $a_{t+1}$ .

What, if, we setup our search that we look for the intrinsic motivation algorithm itself? Possibly a bigger question yet, how would we know we found an interesting algorithm? Do we rate survival, exploration behaviour, collaboration with other agents, the ability to terraform the environment, the ability to approximate these measures cheaply? By what measure do we consider the agent successful?

The answer is: I’m not sure. But I think we’re asking the right questions.