On reading the FOXCS thesis, I have an idea for a possible long-term system that may hold merit. I have already begun venturing into the relational domain with the CE system, and it is proving to be more difficult than a propositional domain. The propositional domain has the benefit of a finite number of rules. Even in the random rule case for Ms. PacMan, it could still be managed.
Because relational rules create a large number of possibilities (even in a bounded relational world), and the best or useful rules may not immediately be obvious to the system, the rule base needs to introduce new usable rules throughout the learning process.
I have already formulated ways for this to happen, and rule deletion can be taken care of based on rule fitness. FOXCS defines a nice way of determining fitness which could be used as part of the rules in cross-entropy. Essentially, we want to remove rules that aren’t working, mutate rules that work some of the time, and attempt refinements on rules that are working.
As stated in a previous post, rules that never fire (conditionally, let alone their action) will be removed through the CE process. Rules that do fire, but cause no change in the state space (notably Blocks World, Ms. PacMan will be tricky as it is a multi-agent environment) can be candidates for removal. The conditions used in them should also be noted down as no-go condition combinations. This may cause problems, but it is an area for further thought.
Then there’s the covering operation, which could be used to form the initial rule base. But it’s hard to say whether the system will need it. I don’t think it will be disadvantageous. I suppose it would be a good addition, but it does affect the CE process somewhat. Perhaps rules added as a covering operation should be disregarded for cross-entropy evolution for that round.
The whole idea of the system is that it will automatically evolve dynamic sets of good rules. One important thing to note is that the policy slot probabilities need to be changed with the addition of new rules. Perhaps weakened proportionally to the number of new rules added.