PhD Progress: Bug Fix Improvements

Seems that little bug fix was what was holding the agent back from attaining better results. Of course, the bug wasn’t always present, I know I introduced it recently, but it’s good to see that the agent is back on its feet – even without the presence of useful fromGhost rules.

Here is the results of the experiment 62.78% (78.39%) into the first run. Note: this has been updated to show later results, also of two experiments.

While these are promising, they come at a high (possibly negotiable) price: time. The experiment has so far taken 94 hours and a half hours. Out of that time, 75 hours is learning time. That’s almost 4 days of runtime and just over 3 days of learning time, and it’s not even half complete on the first run. As said in the previous post: this is likely a combination of the changes to the learning rate and the fact that the better PacMan does, the longer it takes. According to the ETA, the first run will be complete in 5 more days… Assuming 10 days of learning, that’s 100 days to simply produce some reliable results. 3 and a third months!

So something needs to be done. Symphony, shorter learning algorithm, optimised Ms. PacMan environment, any of them. Otherwise my PhD will be spent waiting on possibly unstable experiments to complete.

The readable policy goes like this:
A typical policy:
(distanceGhost player ?X ?__Num7&:(betweenRange ?__Num7 0.0 10.0)) (pacman player) => (fromGhost ?X ?__Num7) / (distanceGhost player ?X ?__Num7&:(betweenRange ?__Num7 0.0 52.0)) (pacman player) => (fromGhost ?X ?__Num7)
(distancePowerDot player ?X ?__Num1&:(betweenRange ?__Num1 0.0 1.0)) (pacman player) => (toPowerDot ?X ?__Num1) / (distancePowerDot player ?X ?__Num1&:(betweenRange ?__Num1 0.0 50.0)) (pacman player) => (toPowerDot ?X ?__Num1)
(distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 1.0 13.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 1.0)) (pacman player) => (toDot ?X ?__Num3)
(distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 1.0 13.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 1.0)) (pacman player) => (toDot ?X ?__Num3)
(distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 1.0 13.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 1.0)) (pacman player) => (toDot ?X ?__Num3)
(distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 43.0 50.0)) (pacman player) => (fromPowerDot ?X ?__Num2) / (distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 32.75 43.0)) (pacman player) => (fromPowerDot ?X ?__Num2) / (distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 12.25 22.5)) (pacman player) => (fromPowerDot ?X ?__Num2) / (distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 2.0 12.25)) (pacman player) => (fromPowerDot ?X ?__Num2)
(junctionSafety ?X ?__Num4&:(betweenRange ?__Num4 -11.0 0.0)) => (toJunction ?X ?__Num4) / (junctionSafety ?X ?__Num4&:(betweenRange ?__Num4 -16.0 -11.0)) => (toJunction ?X ?__Num4)

Note that some rules are not present. Primarily the agent runs from ghosts, though only ghosts 10 units away (smart move, if they are hostile). Then the agent eats powerdots to keep the ghosts pacified. Though the first rule of this slot seems a little useless, as the agent will rarely be 1 unit from a powerdot, and when it is, it’ll probably be eating it anyway. Perhaps it is used to counterbalance an overarching fromPowerDot rule. Following this is the toDot slot, times three. The agent clearly likes to stack that rule with all three rules of the slot being active. This will result in the agent always pursuing dots, but when dots are 0-13 units away, the agent pursues them with gusto. The fromPowerDot rule seems a little useless; running from distant powerdots, but perhaps it is just how the agent copes with being forced to use that rule (it is likely to disappear with slot removal as it has a selection chance of 0.58). The junctionSafety one doesn’t seem to be useful either; perhaps the same reason for the previous slot.

When I checked in on the agent previously on the weekend, the fromGhost rule was not the top rule; I think toPowerDot was. Or perhaps toFruit (which is strangely missing – it’s selection ratio was too low, probably caused by the fact the rule rarely triggers). So I’m picking the point in the graph where performance takes off is where fromGhost was most actively used.

Leave a Reply

Your email address will not be published. Required fields are marked *