It seems like I may need modularisation to be introduced now. The specialisation step of the rule-generating covering algorithm requires more than just random mutations of states, as there are too many states to possibly cover from.
The covering algorithm will generate (in one-to-three covering steps)
on(X,_) & clear(X) -> moveFloor(X) and
clear(X) & clear(Y) -> move(X,Y).
An example of the algorithm working: unstack goal. Note that the moveFloor rule is optimal for unstack, so no specialisation is necessary (it could also be highest(X), but either rule works fine).
The stack goal is slightly more tricky, in that it requires Y in the move action to also be highest(Y). This can be achieved by noting the pre-goal state. While the agent may not recognise the exact predicate needed, it will generate three specialisations: highest(Y), on(Y,_), above(Y,_).
A much harder goal is onAB, which introduces constants as specialisations. Ideally, we want the specialisations to swap X for a and Y for b in the move rule. However, this could take more than one specialisation step (probably 2). Even then, once we have reached this point, the policy will not be optimal. Though it will evetually solve the problem, it will do so by unstacking all or most blocks first with the general moveFloor rule (assuming the general moveFloor rule is in the policy).
Here is where modularisation comes in. Modular rules allow conditions to be achieved over constants. achieve(clear(a)) achieves the goal of clearing a. So when specialisations generate constant preconditions, we need to arc off into learning to achieve that module and then come back, armed with that knowledge.
A problem with modules is that two modules may conflict (i.e. achieve(clear(a)) and achieve(clear(b))). For example, while clearing a, we may cover b. Modules need to be able to be combined to achieve both at an optimal rate.
The optimal clearA goal rule is: above(X,a) & clear(X) -> moveFloor(X). While in this case, the moveFloor(X) will always let X remain clear, making conflicts impossible, in other cases this may not be true. Let’s say an alternative optimal clearing rule is: above(X,a) & clear(X) & clear(Y) -> move(X,Y). This could cause clashes if Y was say, b.
A simple combination of the two would be to join the rules together in a policy. This still has the possibility of messing up though. Because the clear rule can be defined both by move and moveFloor, I can only hope that both rules are present in the final fixation of the goal policy.