PhD Progress: Expanding Pre-Goal State Covering

Pregoals were initially used by the algorithm to work towards a single achievable goal state, illustrated best in Blocks World. They are still somewhat useful in Ms. PacMan, but because Ms. PacMan’s goal is much more variable, the concept of a singular ‘pregoal’ state is less useful.

What I really need in Ms. PacMan is perhaps simply a pre-reward scheme. Or maybe something which mirrors the optimal agent’s decision. The example in question is determining a good rule for fromPowerDot. By itself, fromPowerDot is unlikely to ever be included in a pre-goal state, and if it is, it will probably include useless information. Even trying to use a pre-reward scenario probably wouldn’t work,a s fromPowerDot is a strategic action, not an obvious reward achieving action.

So the only option left is to use behavioural cloning on the optimal agent’s behaviour to attempt to form a pre-goal from whenever it is used. But as a general learning scheme, this upsets Blocks World behaviour (among other behaviours, no doubt).

Furthermore, another problem with the pre-goal scheme is that is ‘hard’ logical learning – it doesn’t allow probabilities (which is odd, considering the amount of probability present in the rest of the learner). This, as I have said before, needs to be modified so it has probabilistic behaviour.

I think perhaps the pre-goal learning needs an entire overhaul, though I’m not sure of any solutions yet. I’ll have to have a good think about it.