Had a big brainstorm about covering today, as it seems to be about that point. I should be focusing my efforts into the paper, but I’ll get that done later.
When matching rules, do so iteratively, as in one condition at-a-time. This can simply be achieved by checking if each prerequisite is present (or inferred, in the case of background knowledge). Given the rule clear(X) & on(X,Y) & highest(Z) -> move(X,Z), we can check if clear(X) holds somewhere. If it does, use the instantiated terms for each following predicate using X (on(X,Y)). Say X unifies with ‘a’ and ‘b’ then test if on(‘a’,Y) matches, and on(‘b’,Y) if it fails.
This process will have to be performed recursively. Technically this is already performed by the Unification algorithm, but it exits when a term does not unify.
Once we know which terms unify or not (say on(X,Y) does not match anywhere), we can selectively remove them to generalise the rule. Naturally, some conditions cannot be removed as they may be used in the action, which will need to be checked.
In the other direction, we can specialise rules that don’t always lead to state changes, even though the conditions fire. In this case, we may need an extra predicate, the removal of an existing but not-tied-to-the-action predicate, or the replacement of a tied-to-action predicate with another utilising the same variable. These can be found by covering the current state
Covering, given by FOXCS, simply states that if there are no rules matching the current state, cover new ones by using the current state (or a subset thereof) as the conditions and a valid action as the action. This requires that the observations given to the agent are of the current state, S, and the valid actions to take within the current state, A(S). Rules are created by first selecting an action to cover, where each rule concerns only one action (i.e. move, moveFl, rather than move(a,c), etc). This may not always work in the StarCraft and PacMan domains and warrants investigation.
In StarCraft, we have many actions to choose from, with multiple permutations, Using A(S) greatly reduces the subset of action choices (i.e. can only build barracks when we have X lumber and Y gold or whatever the equivalent is). Another option for reducing actions is by only covering for the action, rather than the specific action, as said above. Otherwise we’ll be looking for a rule to send unit X to the mine, the enemy, the barracks, the other side of the map, etc.
A problem is how much of the state should be covered? This can go two ways: maximal (bounded) covering, or minimal covering.
This is by covering every predicate in the state concerning the terms used in the chosen action. For instance, when covering move(a,c), we only concern ourselves with predicates using ‘a’ or ‘c’. These maximal rules can then be specialised by also covering any other move actions and unioning the resulting rules. It would be inefficient to union all rules, so we union until: the conditions are minimal (i.e. are only predicates that are tied to the action and no others), or the rule doesn’t change after X iterations (X=5?). Also, the order in which we go through the actions is such that we don’t reuse variables in the action. So move(c,e), then move(b,c) (because b != c, c != e), then move(e,b) (e != c, e != b). This order of variable differences should speed the unionisation process.
A working example may help. Given the state:
on(c,a), on(a,d), clear(c), clear(a), clear(b), onFl(d), onFl(e), onFl(b), above(c,a), above(c,d), above(a,d), highest(c).
move(c,e), move(c,b), move(b,e), move(b,c), move(e,c), move(e,b), moveFl(c).
So a possible cover is to use
move(c,e) < - on(c,a) & clear(c) & clear(e) & onFl(e) & above(c,a) & above(c,d) & highest(c)
= move(X,Y) <- on(X,_) & clear(X) & clear(Y) & onFl(Y) & above(X,_) (x2) & highest(X)
This can be unioned with
move(b,c) < - ...
= move(X,Y) <- on(Y,_) & clear(Y) & clear(X) & onFl(X) & above(Y,_) & highest(Y)
=UNIONED= move(X,Y) <- clear(X) & clear(Y) which is exactly all move needs.
When realising the stack goal, this will require some mutation to achieve the optimal, but it's a good step.
Another problem is the use of constants i the goal (onab). Musings for another post, I think.
Minimal covering concerns only using the minimum number of predicates for a valid rule. For instance, a valid rule could be clear(X) & onFl(Y) -> move(X,Y). Of course, these rules are found from the current state and are specialised/mutated based on whether or not they always fire (in the sense that the condition is met, but the action is not). An approach for the minimal covering is to create all (or many) permutations of minimal rules covering the state and using cross-entropy or some other method to realise the best. However, this is inefficient. It has the upside of instantly creating the stacked goal rule, but that rule has to be found through cross-entropy to be realised.
Maximal covering is the most likely choice, but minimal covering (or components thereof) may be required for specialisation.
Modules so far have not been mentioned. I'm not sure how they'll fit in here, as they aren't observations nor actions. At the moment, I see them in this format: Achieve(clear(a)) & Achieve(clear(b)) -> move(a,b). This isn't so much a rule as an action list. Perhaps it could be combined as a sort of policy.
How the agent utilises them could be achieved by using a teacher. The agent observes the state just previous to the goal and attempts to achieve the predicates seen. Of course, this could mean the agent attempts to achieve the EXACT state, which would be bothersome. In any case, if it notices that clear(a) and clear(b) are required, it can work to achieve them. Perhaps it could do the same unionising seen in covering states, but over episodes (e.g. highest(a) is true in one, but not another so it mustn't matter towards the overall goal).