PhD Progress: Notes from NZCSRSC

While these notes don’t particularly concern the stuff at NZCSRSC, I did note them down during particularly boring talks. So I wasn’t just snoozing through them.

– Perhaps modify the system to a more TD approach so rule probabilities are updated instantly (but with a low update param*). A problem with this is that I cannot know what a good policy achieves in terms of reward, so updates will require a relative ‘average’ score for the policies. Hence, rule updates in this manner cannot proceed until a window of results is able to be obtained which calculates the average score of a numebr of policies.
* Perhaps the update param could scale upwards with a rule’s experience (num updates).

– ‘Pre-goal state’ works well for Blocks World, but for learning modular goals (Ms. PacMan: Eat Ghosts, StarCraft: Create Barracks), using ‘milestones’ instead may help mutate rules quicker.
— For example, the overall goal of Ms. PacMan is eat dots. Hence, pre-goal: (dot ?X) + others…
— This says nothing about avoiding ghosts, unless the action set inculdes ‘avoid ghost’

– Ms. PacMan environment needs a fundamental format change. Perhaps the observations should be feature-based? It might depend on modularisation, and how well it can apply to Ms. PacMan. I think Szita and Lorincz’s approach may be the best, in which we have several difference actions (eatDot, eatPowerDot, avoidGhost, eatFruit. Is there really anything else?).

– Perhaps at every time-step, Ms. PacMan needs to moveTowards something and moveFrom something, instead of 3 general actions. The low-level direction depends on the situation. For example, moving towards X may also !moveFrom Y, but doing so at a 90 degree angle is actually better.

– Like Bernhard suggested, use background knowledge to simplify rules. Either simplify pre-goal state, and use domain info when mutating, or filter mutations into most simple format.

– It is likely that if the agent does find a solution to playing StarCraft, it will be but one strategy of many. An alternative to this is to have a multi-goal problem. For example, Zergling Rush consists of: generate many Zerglings, Find enemy, ATTACK! (or something). The first two can be achieved simultaneously, but attacking comes later.
Only when all goals have been achieved will the environment be satisfied.
Again, this can be achieved modularly:
(generateZergs)
(findEnemy)
(attackEnemy)
or something like this…

– Because StarCraft can have so many units, each with actions, the actions list returned must be quite large, possibly as the same size as number of units.

– Refactor system to return variable number of actions, if necessary to the environment.

– Often in StarCraft or Ms. PacMan numbers are involved, like numDots, dotDistance, unitDistance…
The distance metrics can be handled using background knowledge (defined by the environment), but custom predicates can be defined which simply count the number of each predicate.
— count(block, < # blocks>)
— count(highest, )
— count(clear,
)

– These numbers can be used in reasoning – somehow…
— Easily defined, anyway with ‘accumulate’

– When dealing with distance metrics, actions need to be linked to the object with least/most distance. So (eatDot ?Dot) will always go towards the closest dot.

– Could just have the environment discretise distances – it will be an easier initial step anyway.