Firstly, balls. I have been reading the paper on FOXCS and it’s basically a better version of what I’ve been doing. Hell, its even inspired some extra ideas. It has all the mutation operators needed.
I think the cross entropy system is ill suited to first-order logic. It may have worked with the feature based Pacman, but noit so much with the first order. The rule generation procedure is just not working. You can’t expect randomly generated rules to coalesce into a nicely formed policy. Especially in Blocks World, where you can’t move blocks unless they’re clear. That requires every rule to state that the block being moved is clear (and the location too). The odds of this happening are slim.
The idea I received from the FOXCS paper is the use of covering to generate rules. This way, it can be guaranteed that the state is valid and any actions coming from it (assuming we have the set of valid actions per step) are valid also. I cannot expect these things to be learned randomly.
So I’m quite fucked off. Firstly the approach I’ve been working on for the past few months is shit, and secondly, if I were to take a better approach, it has already been done. I certainly hope Kurt can offer some guidance. Perhaps I’ll return to my original plan of SRRL. The CE approach was just originally meant to be something to get me acquainted.