Seems the performance of Ms.PacMan has decreased. This is possibly due to the ‘broken’ mutation operators used at the moment. Because the unification process was modified such that general rules now contain ranges, the mutation process no longer mutates these ranges because it lacks the capabilities at the moment. Therefore, PacMan achieves less reward.
A Ms. PacMan experiment was run recently (40 hours in, though half of that was just testing) which has the following results (about 62% completion):
This is significantly worse than the previous experiment over the regular environment. While there is some initial improvement in the rules, it doesn’t appear to be increasing very fast and the last few results could simply have been flukes.
The readable generator is:
A typical policy:
(distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3)
(distanceGhost player ?X ?__Num7&:(betweenRange ?__Num7 0.0 52.0)) (pacman player) => (fromGhost ?X ?__Num7)
(distancePowerDot player ?X ?__Num1&:(betweenRange ?__Num1 0.0 50.0)) (pacman player) => (toPowerDot ?X ?__Num1)
(distanceGhostCentre player ?X ?__Num5&:(betweenRange ?__Num5 0.0 51.0)) (pacman player) => (fromGhostCentre ?X ?__Num5)
(junctionSafety ?X ?__Num4&:(betweenRange ?__Num4 -16.0 28.0)) => (toJunction ?X ?__Num4)
(distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 0.0 50.0)) (pacman player) => (fromPowerDot ?X ?__Num2)
(distanceGhostCentre player ?X ?__Num8&:(betweenRange ?__Num8 0.0 51.0)) (pacman player) => (toGhostCentre ?X ?__Num8)
(distanceGhost player ?X ?__Num6&:(betweenRange ?__Num6 0.0 52.0)) (pacman player) => (toGhost ?X ?__Num6)
(distanceFruit player fruit ?__Num9&:(betweenRange ?__Num9 0.0 52.0)) (pacman player) => (toFruit fruit ?__Num9)
Obviously, toDot behaviour is best, followed by fromGhost behaviour. Normally, I’d prefer fromGhost to be highest weighted, but because it only contains one (or two) all-encompassing rules, it would mean the agent spends most of its time cowering in the corner. ToPowerDot is above fromPowerDot, generally a good choice, and toGhost and toGhostCentre are both lowly weighted. Strangely, toFruit is quite low. Possibly because it doesn’t turn up so much? You know, the fruit may be making all the difference between the scores from this run and the previous one.
On the bright side, the faster optimisation seems to be working, with this run due to be completed in about 24 hours, making a total time of 63 hours.
I need to fix this range mutation and also to fix the mutation towards useful to/fromGhost rules that include the binary attributes edible/aggressive. Furthermore, the modularisation for learning clear still hasn’t been 100% solved, as sub-optimal rules are being chosen as the best.