I have finished the code for covering a state when no rules fire. It appears to be quite stable and usually always finds the most general rules. However, this can also be a problem. Because the most general conditions for an action (i.e. the basic action preconditions) may not be the best rules for the job (for example stack requires highest(X), which is above the basic condition of clear(X)).
Not only is there a problem in generality, but also in goal constants (onAB). The onAB problem requires that the constant terms a and b are present in the rules to have an optimal policy and as of yet, the algorithm doesn’t have these. I feel like I’ve addressed this problem before but I can’t find where.
A possibility for heuristical specification is to note the pre-goal achieved state and perform a direct specification, or perhaps a number of specifications on the rule (creating a bunch of mutations). The rule can then be removed, as at least one of the children should be on the right track towards an optimal rule.
Another issue is that of covering which does not find the most general rule. For instance, the generated rules would sometimes create rules which included the onFloor(X) (tied to clear(X)) condition simply because the actions it used to cover all dealt with rules on the floor. The only solution to this I can currently think of is finding the actions proposed by the rule and crossing them with the valid actions of the same type, and ensuring they match. If the valid actions have more actions than the rule predicts, then the rule is not general enough. But this could undo the progress made with the above paragraph’s specialisation mutation.
Perhaps during covering creation the unification needs to occur over ALL rules, not just until it senses no change. This still has a small chance of creating onFloor(X) rules. Maybe mark which rules have attained maximum generality and which haven’t and for every rule that hasn’t attained maximum generality, cover it until it does (or has seen enough states, as it may be impossible to attain maximum generality). Using this marking system, rules which are mutants of maximally general rules can avoid being re-generalised.