PhD Progress: Automatic Environment Learning

I have recently been working on a mutation operator which creates new rules using the known predicates of the environment. For instance, adding cl(X) to a rule not already containing cl(X) but does mention this X. However, a problem this process introduces is the problem of creating rules which are differing duplicates (essentially are the same rule, but are worded differently, such as the on(X,Y) -> B and on(X,Y) & abv(X,Y) -> B case) and creating completely useless rules (on(X,Y) & onFl(X) -> B case). Furthermore, this process can intrtoduce negation, so that needs to be accounted for as well.

I have so far got around this by using existing background knowledge to check rules, and also introducing a new form of background knowledge (well it is still the same) which isn’t evaluated by the JESS compiler, but is still a valid and legal rule for the environment. However, as I wrote up these rules, I realised that I am essentially telling the agent the dynamics of the environment, which a smarter planning agent could use to achieve its needs. Which isn’t really a bad idea at all, perhaps something I’ll check out later. Anyway, I thought of a new agent measure to learn the environment,

Because the agent can spend so much time in the environment, it should be able to learn the dynamics of the environment by itself, and learn which conditions are always together, and which are apart. By allowing the agent to learn the environment, this means that every background knowledge rule doesn’t have to be declared by the environment designer, only the ones that are required for asserting predicates automatically need be.

The problem this task faces is firstly extra overhead, but as all of the data about the state has already been collected, the agent need only sort and check it against its current beliefs of the environment’s structure. The second problem is deciding when to stop checking the environment. Because the agent only checks the environment for the first few episodes (for covering purposes), it may find its belief’s of the environment to be short-sighted. Like many learning mechanisms of the agent, it may just have to be settled after a number of episodes or something and only forcefully checked when the agent covers new rules and maybe for the first few pre-goal states.

This learning mechanism may have the capability of learning major shifts in the environment, but for now it can just learn constant rules for the entire environment.

The mechanism operates by maintaining three lists for each (non-type) condition: sometimes true, never true, and always true (Both, False, True). Whenever the agent encounters a state, it evaluates the conditions and their relation against other conditions (with each condition in simplified variable form). So initially, after one state has been seen, all currently true conditions are in the True list and all other conditions in the environment are in the False list. For each state seen after this, conditions in either list can either stay where they are, or shift to Both list. Eventually, after a number of states, the observed behaviour will be stabilised for X steps (and have seen Y pre-goals), so the agent can actively scanning the state and focus on learning which rules work.

There is a natural bonus to this system as well. I’m still not 100% sure if it’s foolproof (I’m sure any logical programming book will be able to confirm it for me), but the system of implication allows conditions to be more quickly spread across other conditions. For example, highest(X) -> clear(X), and clear(X) -> !on(?,X). Therefore, highest(X) -> !on(?,X). This may just be learned automatically anyway, but it may be beneficial to be aware of this.

This mechanism can be built into the existing covering class (as a separate class) which could be merged with the known ranges member. Ranges could be a problem too…