An algorithm of sorts describing the covering (and mutating process)
- Agent is at (S, A(S)) with no rules firing in its policy.
- Cover rules R:C->A for each A in A(S). Some or all of the rules may be at maximum generality (this is known).
- At every state, attempt to further generalise the rules not at maximum generality until all rules are at maximum generality.
- For every rule that reaches maximum generality, create mutated specialisations of the rule using the pre-goal state (one complete trace is given to the agent at the start of the experiment as a teacher).
- Continue running the agent through the environment (episodes), noting down each pre-goal state and using it to form a general description of the minimal pre-goal state. Also be sure to note the agent’s final action taken.
- Once the pre-goal minimal state is known/settled (no changes to the minimal state for X iterations), mutate further specialisations of the maximally general rules, removing existing mutations (from the first pre-goal) that aren’t possible mutations for the minimal pre-foal state.
- For the action, if the action taken by the agent is always the same (same predicate and constant terms), then create a rule which matches the relevant minimal pre-goal facts for the action (e.g. if the action = on(a,b), then the conditions for the rule are all predicates concerning ‘a’ and ‘b’). This rule should be perfect, so fix it in a slot.
- If the action isn’t the same, but is of the same predicate, then it is clearly a general action concerning particular predicates. If the action is general, then clearly the pre-goal states are general and both need to be generalised. See further notes)
Regarding unifying pre-goal states into a minimal state: When the states are stored and unified, the covering unifier need only note constants present in the goal. So for the onAB case, we only need to check they’re under the same predicates in the pre-goal (clear). If they’re not (some other case), well, I’m not sure. Generalise them?
In the other case, where goals have no constants, the pre-goal states are stored as a bunch of anonymous variables, and a few variables. The variables are the terms used in the action (where the constants are swapped for variables). The rest of the terms in the state are swapped for anonymous variables.
Question! Is it necessary to store groups (i.e. the number) of facts which all have the same fact and all use anonymous variables? Probably not. It would be best not to store these groups as the rules developed need to be general and not to care about whether there are 5 anonymous blocks clear when there are only four. It probably wouldn’t matter anyway, as the rules created will likely disregard these anonymous variables anyway and only concern themselves with the variables or constants.
So basically, when a pre-goal state is stored, every term in the state is generalised to an anonymous variable except terms used in the final action and terms in the goal. However, during unification, if these non-anonymous terms don’t unify, they become anonymous (unless the cause of non-unification is that the fact predicate doesn’t exist in one pre-goal but does in another).