Well, I started the grand experiment(s) yesterday at about noon (I can’t remember the exact time) and I have come back to check on them. One thing I thought I should do after I had started them was to make sure they output their progress every episode (after the generators were updated) so they could pick up where they left off.

However, the problem I came to find was not that of crashed experiments; they were all working fine. The problem was that of infinite PacMan. Two of the experiments (from 3) had scores in the millions, and spare lives equally as large, and the other experiment was exploiting a bug in the code where PacMan moves back and forth along the wall at a warp-point.

So, the first issue can be solved by having a level-limit. Perhaps I’ll stick with the 10 levels. So the nature of the experiment will be different to that of the paper, but it always was anyway. If the agent completes all 10 levels, then the episode ends.

The second issue seems to be a bug in the ghost code. Although PacMan was practically staying in the same place, the ghosts were just moving around in circles. The ghost’s movement is determined by a target point which they move towards. However, this movement algorithm doesn’t seem to take into account warp points (paths in the level that go out one wall and come in the opposite wall). I rarely see a ghost take one of these warp paths – if ever! A further method of halting infinite games (perhaps PacMan is able to outmaneuvre a ghost forever) is to simply place step limits on the episodes. Something quite big oughta do it.

Perhaps it is time to give the ghosts individual behaviour while I’m at it too. Currently, all the ghosts have the same greedy, but lonely, behaviour. A ghost will chase the player, but maintain its distance from other ghosts. This needs to be changed to the proper ghost logic seen in PacMan (or Ms. PacMan, in this case).

Anyway, here’s the two policies that were all about achieving infinite points:

`[1]: if CONSTANT>0.0 then TO_DOT+`

[1]: if NEAREST_ED_GHOST<99.0 and NEAREST_POWER_DOT<5.0 then FROM_POWER_DOT+

[1]: if NEAREST_GHOST<5.0 then FROM_GHOST+

[2]: if MAX_JUNCTION_SAFETY>5.0 then TO_SAFE_JUNCTION-

[2]: if CONSTANT>0.0 then FROM_GHOST_CENTRE+

[2]: if NEAREST_GHOST>7.0 then FROM_GHOST-

[2]: if MAX_JUNCTION_SAFETY<1.0 then FROM_GHOST+

[2]: if MAX_JUNCTION_SAFETY>5.0 then TO_SAFE_JUNCTION-

[2]: if NEAREST_ED_GHOST>99.0 then TO_POWER_DOT+

[2]: if NEAREST_ED_GHOST>99.0 then TO_POWER_DOT+

[2]: if NEAREST_POWER_DOT>10.0 then FROM_POWER_DOT-

[3]: if NEAREST_ED_GHOST<99.0 then FROM_POWER_DOT+

[3]: if NEAREST_GHOST>7.0 then FROM_GHOST-

[3]: if NEAREST_GHOST<4.0 then FROM_GHOST+

`[1]: if NEAREST_GHOST<3.0 then FROM_GHOST+`

[1]: if MAX_JUNCTION_SAFETY<2.0 then FROM_GHOST+

[1]: if GHOST_DENSITY<1.5 and NEAREST_POWER_DOT<5.0 then FROM_POWER_DOT+

[1]: if MAX_JUNCTION_SAFETY<2.0 then FROM_GHOST+

[1]: if CONSTANT>0.0 then TO_DOT+

[1]: if MAX_JUNCTION_SAFETY>3.0 then FROM_GHOST-

[1]: if NEAREST_GHOST<5.0 then FROM_GHOST+

[2]: if CONSTANT>0.0 then TO_DOT+

[2]: if CONSTANT>0.0 then FROM_GHOST+

[2]: if NEAREST_ED_GHOST>99.0 then FROM_POWER_DOT-

[2]: if NEAREST_ED_GHOST<99.0 then FROM_POWER_DOT+

[2]: if MAX_JUNCTION_SAFETY<3.0 then FROM_GHOST+

[2]: if NEAREST_GHOST>7.0 then FROM_GHOST-

[2]: if NEAREST_ED_GHOST>99.0 then TO_POWER_DOT+

[3]: if NEAREST_ED_GHOST<99.0 then TO_POWER_DOT-

[3]: if MAX_JUNCTION_SAFETY>5.0 then TO_SAFE_JUNCTION-

[3]: if NEAREST_ED_GHOST>99.0 then FROM_POWER_DOT-

Big policies, but they were still from the early part of he cross-entropy process. They seem to basically focus on getting all the dots, and avoiding ghosts when near.