The code for dropping into automatic modular learning may be a little tricky. But then again, because I already have code for learning towards a goal, it may not be so bad. This post is intended for me to organise my thoughts and how I will go about learning modules.
At the beginning of an iteration, the learning controller checks if there are any rules conditions which require modules to be loaded (conditions with only constant terms). If so, we can ‘drop into’ learning the module, by essentially setting up another learning environment (recursion, anyone?). The problem with this is that PolicyGenerator is singleton, so I may have to modify this to remain static, but able to set the instance to whatever.
The actual learning itself will be governed by the agent, and should be relatively fast, as the agent is able to dynamically set the goal. So, say the module being learned is clear. The agent receives a state, uses the LGG rules to navigate (these are known from previous pre-processing) and then can dynamically create goals from the new state. Because an agent notes the previous state, it can create pre-goals towards any dynamically created goals in the immediately following step.
For example, the agent has a blocks world of 5 blocks; 3 in one stack, 2 in another. Hence 2 blocks (let’s say a and e) are clear. Agent moves a to the floor, so now the state has 3 stacks; 1 block, 2 stack and 2 stack. The clear blocks are now a, e and say c. The agent then examines if it has achieved any imaginary clear goals (looking for things that are clear now that weren’t before).
Thought! I could conversely also create not X modules – i.e. not clear – by looking for unachievements of goals (removing clear predicate). I have little use for it now, but it may be handy in StarCraft (not enemy trooper).
Back on track… So the agent notices (clear c) is new, and tells itself that it just achieved the clear c goal, so it notes down a general pregoal of the state (swapping c for ?_MOD_a), and continues in this fashion. Eventually, the pregoal will settle (quite quickly I imagine) and the agent can begin proper learning.
Looking at this algorithm, I imagine the agent would be able to use this in regular learning (onab). The pregoal state would be reached much more quickly. Not only this, but the agent could also exist in continuous domains, without obvious reward (i.e. playing). This could be quite a big deal…
Hopefully I remember this, but for now I should just try and get the agent learning modules, then I can worry about continuous domains.
A side note: Modularisation may not work with objects that are unique in their properties. For instance, block a performs differently to block b, but both blocks share the same predicates. However, this is essentially turning the problem into a POMDP, as if there is no information which states block a is different to block b, then the agent is being deliberately misled.
Perhaps later I will attempt to tackle POMDPs, as they are a part of this world – we never truly know every property of an object. The original intent of this research (statistically driven logic – ProbLog), may be what it all comes around to in the end anyway. It shouldn’t be too difficult to record which facts occur with probability. Though it would probably require another rule engine. No matter, I can deal with it when I get there.