PhD Progress: Successful developments

Seems that all of my work over the past few weeks is paying off. All the additions of extra learning options and such. Blocks World is now able to complete its onAB learning task in little over 36 minutes (just under 20 minutes learning time) (this includes learning modules as well). The modules it learns are compact, neat and valid, consisting of minimal rules (thanks to the slot removal aspect). The convergance property allows the learning to progress quickly along with little over a minute per iteration used.

As for Ms. PacMan, it is still slow, but after 12 (learning) hours, the experiment is 2.7125% along (27.125% iteration). The ruleset is beginning to shape itself to something useful, with handy fromGhost rules (which include conditions for ghost state: aggressive) and toDot rules near the top, and other, less useful rules near the bottom/disappearing. Assuming it continues at this speed, one iteration will take ~45 hours (2 days). Hence the entire experiment takes 20 days, but these can be split up.

Speaking of splitting up, Eibe brought up the possibility of splitting the learning across multiple machines (i.e. Symphony). This could be easily achieved, as the very nature of the experiment allows it to be split. Simply send out X agents operating in their own environments and when they return, sort them in order and update generator. Then repeat. Of course this current system operates in an iterative manner, but the learning should be roughly equal if the update parameter is proprotional to the number of samples.

An alternative to that method is a much larger one which only requires 10 machines, each running entire experiments. Then the results are averaged and stored. But that takes much longer and doesn’t make full use of the number of processors Symphony has available.

Seems there is still the problem of statistical pre-goal unification which hasn’t upset the PacMan experiment yet, but is likely to when a pre-goal is created with edible ghosts. I’ll have top give more thought to sorting that out later.

A testing policy from PacMan:
Policy:
(distanceGhost player ?X ?__Num7&:(betweenRange ?__Num7 2.0 12.666666666666666)) (nonblinking ?X) (aggressive ?X) (pacman player) => (fromGhost ?X ?__Num7)
(distanceFruit player fruit ?__Num9&:(betweenRange ?__Num9 2.0 14.5)) (pacman player) => (toFruit fruit ?__Num9)
(distancePowerDot player ?X ?__Num1&:(betweenRange ?__Num1 0.0 51.0)) (pacman player) => (toPowerDot ?X ?__Num1)
(distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3)
(distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 19.5 29.25)) (pacman player) => (fromPowerDot ?X ?__Num2)
(distanceGhostCentre player ?X ?__Num8&:(betweenRange ?__Num8 0.0 13.0)) (pacman player) => (toGhostCentre ?X ?__Num8)
(distanceGhost player ?X ?__Num6&:(betweenRange ?__Num6 34.0 43.0)) (pacman player) => (toGhost ?X ?__Num6)
(junctionSafety ?X ?__Num4&:(betweenRange ?__Num4 -8.0 0.0)) => (toJunction ?X ?__Num4)
(distanceGhostCentre player ?X ?__Num5&:(betweenRange ?__Num5 0.0 52.0)) (pacman player) => (fromGhostCentre ?X ?__Num5)

Clearly from ghost behaviour is most important, along with eating fruit and using the powerdot to keep the ghosts placid. The toDot is all encompassing and will always be active. The fromPowerDot rule will only trigger at a distance, so it has no value. The toGhost rules have little value currently, as they don’t include the edible attribute, but that is likely because they can’t be included. The last 2 are practically useless, though the all-compassing fromGhostCentre does lend some defensive behaviour.

Just had a thought about pre-goal mutation and such. Haven’t fleshed out the possibilities yet, but mutate rules based on what constant elements are present and mutate in relevant conditions seen in conjunction with said elements. So a rule concerning a ghost would mutate in the 4 attributes concerning the ghost: edible, aggressive, blinking or nonblinking. This could also create an opening for negation, allowing me to remove half of those attributes. I’ll think on it.