Boosting neuroevolution with inbreeding

The condition of my wrist has still been delaying the implementation of the GPU accelerated version of my neuroevolutionary program, but I happened to read something rather interesting which stimulated my curiosity in certain evolutionary strategies. The book in question is titled The Quest of Human Quality: How to Rear Leaders by Anthony M. Ludovici.

Ludovici’s central thesis is that in order to create a ruling class that is worthy of its status things like mere special education aren’t enough. You need to take great care in breeding the right sort of people. He argues that the best examples of ruling aristocracies in history have been in ancient Egypt and Venice while many other ruling classes in Europe and elsewhere hardly deserve the title of aristocracy.

By some counts ancient Egypt maintained a relatively high form of civilization for close to five thousand years beating in longevity any contenders by a rather high margin. So it seems they did certain things right and Ludovici argues that the one very important thing they did right was that they practiced inbreeding to the degree that among the ruling class it was common for a brother to marry his sister and so forth. I think I recall that some historians have argued that close inbreeding was still uncommon among the common people but Ludovici certainly thinks it happened relatively often due to the tendency of the lower classes to imitate the upper classes.

Now, as I have mentioned before, Neurogenesis uses an evolutionary strategy called Enforced Subpopulations that has received more than little support by previous research in the field of evolutionary learning. It works by evolving separately artificial neuron or memory block subpopulations (each of which is linked to a specific location of the network topology). Complete networks are formed by selecting from each subpopulation a single unit into the network. So of course it is a form of inbreeding. And it works relatively well as it tends to result in more harmonious networks than some alternative strategies.

Also, Ludovici states that to form an aristocracy you first need to have an isolated population that through inbreeding “has become standardized or uniform in regard to all characters” and then select the sort of exemplars from this population that through further inbreeding form into a genuine aristocratic class that still has a strong connection to the people they rule and thus have their interests in mind much better than a ruling class that comes from the outside and often holds rather different values.

Reading about that I couldn’t help thinking that Neurogenesis also evolves full networks separately from the “common stock” although it allows some upward mobility as whenever a “commoner” beats the worst member of “neuroaristocracy” it takes its place.

Still, one can’t say Neurogenesis promotes incestuous pairings between neurons as they currently are combined more or less randomly. It might be interesting to test sometime in the future how well for example enforced “first cousing” pairings would work in the neuroevolutionary setting. Interesting enough that I just might implement such a strategy as soon as my wrist allows full-time programming once again.


Working on a Cuda branch of Neurogenesis

The brutal fact of competition is that more often than not the fast will outmaneuver the slow and so it probably is also in the field of neuroevolution.

After some testing of various alternatives, I’m now trying to make at least the ‘Evolino‘ mode utilize the power of GPUs (though only those made by NVIDIA) as it’s the mode that is perhaps the easiest to accelerate by moving the needed linear regression with its matrix multiplications, inversions and transposes to the graphics processing unit. Other than that I will certainly try to make the error calculations in the GPU as those can be done by calculating the mean squared difference of two matrices.

In theory I could of course also move things like activation propagation to the GPU but I probably won’t be doing that anytime soon as it would change the nature of the program too much. You could also do the recombination in parallel but based on my current understanding of the situation such a code would not run that efficiently on a GPU as there’s too many ‘if then’ expressions.

Generic NeuralConnections and other plans

Last week I finally managed to get my new more generic classes to work with the refactored program but there are still a few problems with some unexpected behaviour which need to be solved. I also needed to make one rather serious compromise to my intended design as currently the one extra generic variable in my NeuralConnections class has to be Numeric which excludes boolean values that I were using in my most basic class. So later I may still be moving to Scala’s AnyVal but that probably requires the addition of weird implicit conversions or some other little trickery to get the exact functionality I have been thinking about. Anyway, it has certainly been a learning experience so far.

At some point I will probably also move on to using Scala version 2.10+ which does require more than little changes in the program as it makes the original Actors (used to achieve multithreading) obsolete. I’m not yet quite sure what’s the benefit of using the supposedly more modern Akka Actors but presumably there’s some and if and when I realize just what the actual benefit in my case would be I will surely hasten my programming efforts to grab that benefit.

Delayed generics

Some distractions have kept me from having the time to do the necessary refactoring of the Neurogenesis code to make it possible to switch between different implementations of NeuralConnections along with my discovery that I have harboured some misconceptions about how generics actually work in the Scala language.

The first misconception I apparently had was the thought that defining a class like this

class Container[@specialized(Int,Double) T](value: T) {...

restricts it to only work on those two primitives even though it only prevents the cumbersome boxing of those variables to increase efficiency.

Worse, I discovered that things like

def addConnection[T](d:Int,w:Double,t:T): Boolean = {…

to add connections to a TreeMap[(Int,(Double,T))] in my generic NeuralConnections[T] that is a subclass of even more generic AbstractNeuralconnections which includes that abstract method don’t quite seem to work as I had thought probably due to Scala’s strict type system. Still, I hope that by next week I will have managed to find a relatively neat way of doing things…

Refactoring neural connections

The first thing I tried when I begun to program Neurogenesis around a year ago was to use Scala’s specialized construct which would have allowed the user to easily switch between Float and Double precisions. However, I did eventually run into some problems (which might have had something to do with my inexperience in using the Eclipse scala ide and the need to semi-regularly ‘clean’ your project) and decided to use only Double precision (perhaps an overkill in most situations).

After that there have been several heureka moments when I have realized the need for the use of more generic classes and I have had to do more than little extra work to implement those ideas given the existing code base. Most recently I realized it might be nice to allow the user to choose between several types of NeuralConnections to facilitate experimentation. Thus I’m currently in a process of refactoring the program to use a trait NeuralConnections[T] which should make it possible to use for example my idea of making previously learned structure more rigid after each sequential step of complexification.

I can’t guarantee it will be worth the effort but there’s a chance it might as I’m not aware that anyone has tried the idea before (though I wouldn’t be very surprised if someone has as there has certainly been more than little work done in the field of neuroevolution long before I got interested in it).



Doing things a bit differently

For the sake of disclosure, I got the main ideas for my neuroevolutionary program Neurogenesis from a 2005 paper titled “Modeling Systems with Internal State using Evolino” by Wierstra et al. However, I did end up doing certain things in a different way.

Neurogenesis does use the same evolutionary strategy ESP pretty much as detailed in the article (which evolves separate subpopulations of cells of which the networks used in sequence prediction are formed), but rather than evolving only memory cells (or blocks) which in the original work include also the input connections, the program evolves input cells (+output cells) with both forward and recurrent connections separately from memory blocks which also have both forward and feedback connections. I would think this choice leads to trying out more networks which will work badly but also helps to keep the results more diverse and the hope is that at least at times this will pay off.

Update: After doing some re-reading of those past papers it seems the actual evolutionary strategy used by Neurogenesis is closer to the one known as Hierarchical Enforced SubPopulations described in the article titled “Co-Evolving Recurrent Neurons Learn Deep Memory POMDPs” by Gomez and Schmidhuber as the program evolves both neurons (in subpopulations) and full networks and that occasionally novel cell discoveries from the network level are injected into the neuron level.

Yet more ideas for improving neuroevolution

So far I haven’t been very satisfied with the actual performance of Neurogenesis when trying to evolve suitable recurrent neural networks. I suspect part of this is due to the fact that the current version doesn’t discriminate what kind of connections are allowed. Anything goes. You can have recurrent connections from any cell to any other cell including a loop back to the original cell itself. Also, you can have forward connections from the input layer to the output layer skipping the hidden layer of memory blocks.

I have hardly proven that in general this would be a bad idea but I do suspect that in most real world problems it is certainly less than optimal. Thus at least at some point in the future I’m certainly planning to implement a way to limit allowable connection destinations or at least make some destinations less likely than others.

One other thing that might improve performance would be the ability to make previously learned structure more rigid after each successive complexification step. Thus every time the program adds either memory blocks to existing networks (or memory cells to blocks) it would make it less probable that previously learned connections are changed in the next stages of evolution.