Firstly, some news. Blocks World finally works! There was a problem with updating the state that made it unworkable. And it is quick, at 5 minutes for a single run (50 minutes for a 10-fold run) in a 5 block world. The agent also learns as the performance begins at -23 but then gets better to an optimal value of -1. Which may be a problem in the environment, as -1 implies that both a and b are free.
Perhaps this is a problem with the environments and the performance measure. The performance text outputs the average score of the elites, rather than the total population. Perhaps it is being able to simply use the clear(a)&clear(b)->move(a,b). This means that the generated blocks world states may have to be properly generated, rather than the ad-hoc approach I use. Further inspection of the policies lead me to believe that they only use one rule.
Also, the current algorithm isn’t using regeneration or reintegration (or mutation) among the rules. This is evident because in another test (quick again: 10 minutes for a 10 block world) the best rule found is non-optimal (onFloor(a)&clear(b)->move(a,b)).
Transfer Learning for Reinforcement Learning Domains: A Survey
Distracted by code…