A problem appeared today as I was making small changes to PacMan which affect the learning. It is a problem I was aware of once, but not in this manner.
The problem is concerned with covering new rules and securing LGG rules from those rules. Typically, using LGG rules which don’t change for X iterations works, in Blocks World anyway. But in PacMan, where a large number of states can go by without any significant state changes, this number is simply not enough. The problem I am concerned with is the issue of ghosts, which essentially have two states: aggressive and edible; both of which are illustrated by state observations to the agent.
The problem is, when pre-processing the state, Ms. PacMan only sees that the ghosts are aggressive, so the LGG rule for to/from ghosts assumes that aggressive is a constant fact.
To solve this problem, and to also make covering more efficient, the agent checks its LGG rule output facts against the known valid actions and if they differ (valid has more), then trigger covering.
In this sense, LGG rules don’t really exist, unless the rule is minimal (cannot possibly be generalised any more). So I can do the same with LGG rules as I have done with pre-goal states: consider them settled but restart learning if they change.
If I ever get to StarCraft, this is likely to be helpful, as StarCraft has WAY more states per episode than PacMan. I just hope I won’t have to restart learning mid-episode. I don’t think I need to. Meh.
Furthermore, let’s say we believed X for several iterations and had created some mutants from them. I need to investigate simple mutant modification when an LGG rule based on X is proved false. Otherwise I could possibly lose a LOT of learning if the agent explored a previously unexplored state space.