Well, after fixing up the bug in the environment, PacMan is performing better, but is still limited at the ~4000 point mark. The reason for this is likely because there are no useful rules regarding ghost avoidance/chasing. I’m guessing either the pre-goal was never met by the optimal policy, or the goal was met with differing states: edible and non-edible ghosts. Furthermore, it seems like I have made it slower. Whether this is a result of it performing better, or because I changed the number of steps required, I am not sure. I have a feeling both may be involved, but at least I can modify the latter. I should still have the old code for it somewhere in the SVN repository.
The pre-goal model is useless for these sorts of heuristic mutations, so I will likely have to introduce a new algorithm for creating mutations: either with or without the existing pre-goal one. Perhaps maintaining a tree of rules, with the covered rule as the base rule and more specialised rules running down from it. This tree needs to be prunable too. This is beginning to look a lot like TG, though.
A secondary problem is the fact that actions not necessarily directly concerned with objects in the domain may still perform better with conditions linked to those objects (e.g. toPowerDot only when ghosts are not edible). If actions didn’t have delayed effects, this could easily be found by watching the post-action state, but alas, toPowerDot could be active at any point, regardless of PacMan’s distance to a powerdot. As far as I am concerned at the moment, I cannot see a solution to this problem without expensive state watching. Perhaps just the to/fromGhost actions will have to suffice for now.