PhD Progress: Ms. PacMan Single Level, 3 Lives performance

Have been running a Ms. Pac-Man experiment for the past 2.5 days to create comparable results against Szita and Lorincz’s results. While it isn’t complete (still approx. 3-4 days left), the results obtained so far are interesting. I figured Ms. Pac-Man would use a ton of episodes to create useful behaviour, but it converged at 20000 episodes (reasonable for a large domain, perhaps). The end value was 5172 (6117 on the previous test). The final policy was reasonable, though the fromGhost could use a (not (edible ?X)) predicate (which was one of the last to be pruned).

Another is also at the 23000 episode mark, with a value of about 5500. It also has a similar policy (though less converged), and has a larger chance of finding a (edible) & (not (blinking)) toGhost rule. However, it lacks a fromGhost rule. This is probably becaue toPowerDot is first. Another converged at the 2300 mark with a value of about 6100 (though only a few episodes ago it was at 300, due to a bad toGhost rule). The policy for that one was again, close to perfect, but just a bit too far from ideal.

Another is at 2300 episodes, but is not performing as well – it’s at about 2500, after dropping from 5000 a few tests ago. It is unfortunately using toGhost rules (non-edible) at the beginning, which is only loosly backed by lower ranked toPowerDot rules. And the last one is not yet converged at 3500, with a close to good policy.

Judging by these results, it looks as though 3 lives doesn’t really make a difference to the algorithm – which sucks. The agent is disadvantaged by the fact it cannot hover around powerdots when ghosts are distant. Perhaps it would be better to change the decision structure back to singular(ish) actions, where close decisions are split by the next action down the list. That may help the agent.

PhD Progress: Mario Problems

I suppose it was inevitable that I would have problems with Mario. I was just hoping that it would be ready by the conference. perhaps soem nasty mock-up will be anyway. Hopefully it can learn some rudimentary strategies too.

The problem facing me is one of environment definition. Currently, Mario is defined by jumpOn and other such actions. But when playing the game myself, I don’t perform EVERY action as a jump on action. What I’m saying is that the language I have defined is constricting to the agent and it cannot fully achieve human or better performance when bound by jumping on particular things. What would be ideal is a language defined exactly at Mario level, in that the agent has 4 actions (3 iif you consider left and right the same thing in different directions). And each observation of the environment concerns Mario directly and the relations between objects.

When using such low-level actions, Mario would have to learn higher-level behaviour, like jump on enemy. But to do that the agent needs a reward, or incentive. Unfortunately I don’t think a reward is provided when an enemy is killed and even if it is, it pales compared to reward gained by time. The behaviour may be achieved using modules: !enemy(X) which removes a particular enemy.

Another problem is policy determinism. Under the current system, Mario immediately moves to the left-most part of the level and jumps up and down on the left edge because it is closest. The only way for him to progress is to find closer things to jump on, rather than simply proceeding right like a human. The policies are founded using probabilistic rule sets, so perhaps it would be smarter to somehow make policies stochastic as well. I got around this issue using weighted sums of decisions in Ms. PacMan but in Mario the lines are less blurred. If an enemy is approaching, you want to stomp it, not ‘almost’ stomp it.

An alternative is to change behviour on-the-fly if the agent appears to be getting nowhere (jumping up and down on an edge). Maybe re-sample the rule being fired that leads to the repetitive behaviour.

This environment is frustrating me, but who said PhD research was ever easy?

PhD Progress: Formalising Mario into a Relational Environment

One of the main problems currently facing me with implmenting Mario is how to represent it. There are different tracks available to me, and I don’t really know which one would be best.

The first and possibly easiest track is to do the same with Mario as I did with PacMan – turn possible groups of actions into a single action (toDot, etc). In Mario, this would be like jumpOnGoomba, collectCoin. The problem with this is that it is practically defining the environment for a baby to learn in.

The second step up the generalisation ladder is to only use general actions, but still clear enough in their intent: jumpOver, jumpOnto, shoot. These actions are enough to suitably play Mario (there is still the problem of obstacles, when exactly to run, grabbing, hitting boxes, etc. But these should still be enough to evidence some form of play. These actions can be achieved by (theoretically) implementing a slot-splitting mechanic which splits a slot into slots of the identical action, but with different type predicates in their rules. So there’ll be jumpOnto(coin) slot and jumpOnto(enemy). The problem with this is type hierarchies. All things in Mario are objects, and some are coins, some enemies. Some enemies are also non-jumpable, which could be an observation to allow the environment ot provide. Who knows… perhaps the agent can learn which enemies are not jump-ontoable.

Note that this system was theorised for Ms. PacMan once too, and may have to be implemented before Mario is. It is only a question of how much the slot-splitting mechanic will slow/decrease learning.

The third step is at keystroke level. There are numerous problems with this, specifically the fact that the keys are not relationally linked to the objects in game, save Mario himself. Rules could be created for such things, but not currently with the rules created, which use action objects for specialising the rules.