PhD Progress: Performance Decrease

Seems the performance of Ms.PacMan has decreased. This is possibly due to the ‘broken’ mutation operators used at the moment. Because the unification process was modified such that general rules now contain ranges, the mutation process no longer mutates these ranges because it lacks the capabilities at the moment. Therefore, PacMan achieves less reward.

A Ms. PacMan experiment was run recently (40 hours in, though half of that was just testing) which has the following results (about 62% completion):
2064.0
3259.2
3441.8
3574.0334
3836.2334
3582.9666
3420.2
3421.8333
3715.6667
3890.8667
3555.1333
3555.1667
3542.5
3654.1667
3400.3667
3484.4
3443.6667
3809.3333
3616.8667
3061.9
2986.3
3770.3667
3762.3667
3142.9333
3247.0334
3266.5
4481.4
4112.433
4163.967
3327.4333
3934.3
4198.533

This is significantly worse than the previous experiment over the regular environment. While there is some initial improvement in the rules, it doesn’t appear to be increasing very fast and the last few results could simply have been flukes.

The readable generator is:
A typical policy:
(distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3)
(distanceGhost player ?X ?__Num7&:(betweenRange ?__Num7 0.0 52.0)) (pacman player) => (fromGhost ?X ?__Num7)
(distancePowerDot player ?X ?__Num1&:(betweenRange ?__Num1 0.0 50.0)) (pacman player) => (toPowerDot ?X ?__Num1)
(distanceGhostCentre player ?X ?__Num5&:(betweenRange ?__Num5 0.0 51.0)) (pacman player) => (fromGhostCentre ?X ?__Num5)
(junctionSafety ?X ?__Num4&:(betweenRange ?__Num4 -16.0 28.0)) => (toJunction ?X ?__Num4)
(distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 0.0 50.0)) (pacman player) => (fromPowerDot ?X ?__Num2)
(distanceGhostCentre player ?X ?__Num8&:(betweenRange ?__Num8 0.0 51.0)) (pacman player) => (toGhostCentre ?X ?__Num8)
(distanceGhost player ?X ?__Num6&:(betweenRange ?__Num6 0.0 52.0)) (pacman player) => (toGhost ?X ?__Num6)
(distanceFruit player fruit ?__Num9&:(betweenRange ?__Num9 0.0 52.0)) (pacman player) => (toFruit fruit ?__Num9)

Obviously, toDot behaviour is best, followed by fromGhost behaviour. Normally, I’d prefer fromGhost to be highest weighted, but because it only contains one (or two) all-encompassing rules, it would mean the agent spends most of its time cowering in the corner. ToPowerDot is above fromPowerDot, generally a good choice, and toGhost and toGhostCentre are both lowly weighted. Strangely, toFruit is quite low. Possibly because it doesn’t turn up so much? You know, the fruit may be making all the difference between the scores from this run and the previous one.

On the bright side, the faster optimisation seems to be working, with this run due to be completed in about 24 hours, making a total time of 63 hours.

I need to fix this range mutation and also to fix the mutation towards useful to/fromGhost rules that include the binary attributes edible/aggressive. Furthermore, the modularisation for learning clear still hasn’t been 100% solved, as sub-optimal rules are being chosen as the best.