Thinking about Ms. PacMan environment, I realised it’s kind of cheating, with all the actions based on particular behaviours. A true Ms. PacMan environment would use two actions: moveFrom and moveTo.
While these were proposed a while ago, the larger number of actions was chosen to make learning easier. But if I truly want a general learning algorithm, I can’t always rely on acctions to be given. While I am working towards slot addition and removal (funny how I’m reimplementing things present in the original paper) which should be effective for learning a set of good rules per domain, I may need to also look at clustering states.
The problem is present in Hanoi: there are essentially two actions needed: move the smallest tile, and move a different tile. The key is being able to distinguish the two. For this, I may need a relational clustering algorithm.
Ergh. My mind is dead, so I can’t continue this. Well, something for thought later on perhaps.