PhD Progress: Pacman has a brain!

After some weeks (I’m not entirely sure how many. Probably about 4-5 weeks) I have successfully re-implemented most of the RL Pacman paper by Szita and Lorincz. I say most because I have left out two possible action modules, but they were rarely used in the actual paper and they didn’t seem all that useful anyway.

Although I haven’t run a full testing run as they did (population=1000, episodes=50, p=0.05), I have run a shorter episode that demonstrates some learning (well I assume it has. I just record the best policy). Perhaps I should add some testing criteria that actually record the mean and SD of an episode, giving an estimate towards the effectiveness of the generators. Anyway, the best policy from a shorter test run (population=100, episodes=20, p=0.1) is:
[1]: if CONSTANT>0.0 then TO_DOT+
[1]: if CONSTANT>0.0 then TO_DOT+
[1]: if NEAREST_GHOST>6.0 then FROM_GHOST-
[1]: if NEAREST_GHOST<4.0 then FROM_GHOST+
[2]: if MAX_JUNCTION_SAFETY<3.0 then TO_SAFE_JUNCTION+
[2]: if CONSTANT>0.0 then FROM_GHOST+
[3]: if NEAREST_GHOST>5.0 then FROM_GHOST-
[3]: if CONSTANT>0.0 then TO_SAFE_JUNCTION+
[3]: if MAX_JUNCTION_SAFETY>5.0 then FROM_GHOST-
[3]: if MAX_JUNCTION_SAFETY<1.0 then FROM_GHOST+

Obviously there are some redundant rules, but the policy basically functions as such:

  • The default behaviour is to eat dots. That’s what Pacman is all about. I think the reason this behaviour wasn’t present in Szita and Lorincz’s agent was because they perhaps ended the episode once all dots are eaten. So their agent looked to maximise the score over a single map, instead of over the entire game.
  • If a ghost ventures near, start running from it instead of chasing dots, but if it falls away, eat dots again.
  • When at a junction where priority one (probably eating dots) gives several possible directions, choose the direction from a ghost. The other action will not matter, as it will always be overwritten by the ghost running.
  • Finally, there is just further ghost running rules. Although that last rule seems redundant, but hey, it’s there.

This agent earns an average reward (over 3 games) of 69050 points. Note that this also includes points from Fruit, which occasionally appears, rewarding 750 points if eaten.