academic – Sam Sarjant

30.06.0926.09.11

PhD Progress: Infinity Pacman

Well, I started the grand experiment(s) yesterday at about noon (I can’t remember the exact time) and I have come back to check on them. One thing I thought I should do after I had started them was to make sure they output their progress every episode (after the generators were updated) so they could pick up where they left off.

However, the problem I came to find was not that of crashed experiments; they were all working fine. The problem was that of infinite PacMan. Two of the experiments (from 3) had scores in the millions, and spare lives equally as large, and the other experiment was exploiting a bug in the code where PacMan moves back and forth along the wall at a warp-point.

So, the first issue can be solved by having a level-limit. Perhaps I’ll stick with the 10 levels. So the nature of the experiment will be different to that of the paper, but it always was anyway. If the agent completes all 10 levels, then the episode ends.

The second issue seems to be a bug in the ghost code. Although PacMan was practically staying in the same place, the ghosts were just moving around in circles. The ghost’s movement is determined by a target point which they move towards. However, this movement algorithm doesn’t seem to take into account warp points (paths in the level that go out one wall and come in the opposite wall). I rarely see a ghost take one of these warp paths – if ever! A further method of halting infinite games (perhaps PacMan is able to outmaneuvre a ghost forever) is to simply place step limits on the episodes. Something quite big oughta do it.

Perhaps it is time to give the ghosts individual behaviour while I’m at it too. Currently, all the ghosts have the same greedy, but lonely, behaviour. A ghost will chase the player, but maintain its distance from other ghosts. This needs to be changed to the proper ghost logic seen in PacMan (or Ms. PacMan, in this case).

Anyway, here’s the two policies that were all about achieving infinite points:
[1]: if CONSTANT>0.0 then TO_DOT+ [1]: if NEAREST_ED_GHOST<99.0 and NEAREST_POWER_DOT<5.0 then FROM_POWER_DOT+ [1]: if NEAREST_GHOST<5.0 then FROM_GHOST+ [2]: if MAX_JUNCTION_SAFETY>5.0 then TO_SAFE_JUNCTION- [2]: if CONSTANT>0.0 then FROM_GHOST_CENTRE+ [2]: if NEAREST_GHOST>7.0 then FROM_GHOST- [2]: if MAX_JUNCTION_SAFETY<1.0 then FROM_GHOST+ [2]: if MAX_JUNCTION_SAFETY>5.0 then TO_SAFE_JUNCTION- [2]: if NEAREST_ED_GHOST>99.0 then TO_POWER_DOT+ [2]: if NEAREST_ED_GHOST>99.0 then TO_POWER_DOT+ [2]: if NEAREST_POWER_DOT>10.0 then FROM_POWER_DOT- [3]: if NEAREST_ED_GHOST<99.0 then FROM_POWER_DOT+ [3]: if NEAREST_GHOST>7.0 then FROM_GHOST- [3]: if NEAREST_GHOST<4.0 then FROM_GHOST+

[1]: if NEAREST_GHOST<3.0 then FROM_GHOST+ [1]: if MAX_JUNCTION_SAFETY<2.0 then FROM_GHOST+ [1]: if GHOST_DENSITY<1.5 and NEAREST_POWER_DOT<5.0 then FROM_POWER_DOT+ [1]: if MAX_JUNCTION_SAFETY<2.0 then FROM_GHOST+ [1]: if CONSTANT>0.0 then TO_DOT+ [1]: if MAX_JUNCTION_SAFETY>3.0 then FROM_GHOST- [1]: if NEAREST_GHOST<5.0 then FROM_GHOST+ [2]: if CONSTANT>0.0 then TO_DOT+ [2]: if CONSTANT>0.0 then FROM_GHOST+ [2]: if NEAREST_ED_GHOST>99.0 then FROM_POWER_DOT- [2]: if NEAREST_ED_GHOST<99.0 then FROM_POWER_DOT+ [2]: if MAX_JUNCTION_SAFETY<3.0 then FROM_GHOST+ [2]: if NEAREST_GHOST>7.0 then FROM_GHOST- [2]: if NEAREST_ED_GHOST>99.0 then TO_POWER_DOT+ [3]: if NEAREST_ED_GHOST<99.0 then TO_POWER_DOT- [3]: if MAX_JUNCTION_SAFETY>5.0 then TO_SAFE_JUNCTION- [3]: if NEAREST_ED_GHOST>99.0 then FROM_POWER_DOT-

Big policies, but they were still from the early part of he cross-entropy process. They seem to basically focus on getting all the dots, and avoiding ghosts when near.

18.03.0926.09.11

Linux tips

Tips for me, that is.

Because Linux seems to love to be driven by uncooperative, fiddly commands with little sense towards what they’re used for, I figure I should write down any I learn.

sudo su - student
Will get me into the student account on WDM.

vi (filename)
When all you want is to edit the text and you can’t double click a file… Using it is another problem altogether though.

ssh (place) -X
Allows you to SSH into a machine with graphical stuff permitted. Not everything should be run through the terminal!

MORE!
wc -l
Counts the number of lines in a file. Counts empty lines too.

Need a command for saving only 10% of a file. I’m pretty sure Linux would have something like that.

sed -n '10p'
Displays the 10th line from the file.

27.08.0826.09.11

Masters/PhD ideas

The 2008 year is coming to a close (sort of) and it’s time I started thinking of my future. I have received a job offer and I’ll follow that, just to see if I can get in, but I’m really leaning towards post-grad research.

I’m likely eligible for Masters but I recently realised that I could also be eligible for a PhD if I get good marks for my Honours year (which I have been so far). Doing a PhD would be pretty cool and is quite a distinguished title for me to carry. Although, it’s also another 3-5 years inside, which isn’t so great. But then again, 3-5 years at uni is probably more fun than 3-5 years at some dead-end corporate job. Also, if I do a PhD, I don’t have to move house, which works for me.

Anyway, I’m using this space to record down possible PhD/Masters ideas that could be used for research purposes.

Artificial Intelligence: RL + FL
One idea I’ve been tossing around is the combination of Reinforcement Learning and Formal Logic. The way (that I personally believe) humans learn is by storing knowledge and building on that knowledge, using logic. By combining RL, which explores and exploits and FL, which is a compact and useful knowledge representation, an AI agent could be created that learns rules and behaviour based on logical inference and probabilities.

The agent would learn by exploring the domain and working off what it already knows (strict rules that cannot be changed, like gravity for example), and store what it learns as a bunch of rules, which can be used to infer new rules from them as well as hypothesising likely events. These hypotheses could be tested using agent-environment interaction and setting a probability of their ‘truth’ by the results obtained.

Mapping real-life video into digital 3D terrain
Using a video of an area, process it and create a 3D interpretation of it, stored as a 3D world. This would require object recognition from a video, thus including object tracking and placement. Also, depth would need to be evaluated. Because a single image wouldn’t be enough to properly view an area, a video/s would be required. A single video probably wouldn’t get everything either, with some parts of an area skipped over. The program could either fill them in however it sees fit or prompt the user to make another video of the particular area, as indicated by the program. If done in real-time, as an incremental algorithm, the program could tell the user to go to certain areas when filming (“go over ‘here’ (show user location or direction) so I can learn its structure” says the computer).

This would require a great many things to work, which currently aren’t at excellent levels. Image processing (let alone video processing), object recognition and creation, terrain creation, among others. It would be a tough project, likely not to be finished within a single PhD term by one person. Still, it’s an idea.