Have been running a Ms. Pac-Man experiment for the past 2.5 days to create comparable results against Szita and Lorincz’s results. While it isn’t complete (still approx. 3-4 days left), the results obtained so far are interesting. I figured Ms. Pac-Man would use a ton of episodes to create useful behaviour, but it converged at 20000 episodes (reasonable for a large domain, perhaps). The end value was 5172 (6117 on the previous test). The final policy was reasonable, though the fromGhost could use a (not (edible ?X)) predicate (which was one of the last to be pruned).
Another is also at the 23000 episode mark, with a value of about 5500. It also has a similar policy (though less converged), and has a larger chance of finding a (edible) & (not (blinking)) toGhost rule. However, it lacks a fromGhost rule. This is probably becaue toPowerDot is first. Another converged at the 2300 mark with a value of about 6100 (though only a few episodes ago it was at 300, due to a bad toGhost rule). The policy for that one was again, close to perfect, but just a bit too far from ideal.
Another is at 2300 episodes, but is not performing as well – it’s at about 2500, after dropping from 5000 a few tests ago. It is unfortunately using toGhost rules (non-edible) at the beginning, which is only loosly backed by lower ranked toPowerDot rules. And the last one is not yet converged at 3500, with a close to good policy.
Judging by these results, it looks as though 3 lives doesn’t really make a difference to the algorithm – which sucks. The agent is disadvantaged by the fact it cannot hover around powerdots when ghosts are distant. Perhaps it would be better to change the decision structure back to singular(ish) actions, where close decisions are split by the next action down the list. That may help the agent.