This approach has been done before – I think. Which is a plus. But I have to note this down as it may be important.
When learning with the CE algorithm, and attempting to learn a task such as onAB, it would be helpful for the agent to learn how to doClear(X) (where X is a or b), then moving a to b. As Bernhard was saying, this could be achieved by checking the immediate state before goal completion. From this, TILDE or some other relational classifier could be used to find common attributes (in this case clear(a) and clear(b)). This goal pre-condition realisation needs to be performed early on to be of value.
Once it realises that having a and b clear, it could temporarily modify its goal to clear(a) (or clear(X)). Then it could attempt to learn a policy for clear(X), store it, and use it again in the onAB task.
This really only applies in the onAB task, but it is a idea for creating modules. Modules could be made for all sorts of predicates in this manner (on(X,Y), highest(X), etc…). There may be a problem of using the predicate within the policy for solving the predicate, but I don’t see any immediate problems.