While implementing the Mario environment, I had an idea of differing preliminary testing of rules. Initially, the agent could simply test single rule policies (each rule being either the RLGG or a single step from the RLGG). This can determine which slots to split (don’t bother splitting rules with no use) and allows the agent to quickly learn initially useful rules.
This will result in a minimal number of slots. As the agent tests out policies in a normal fashion, new slots can be created from handy rules in the slots which may not have had an initial use, but gain one later on. This is much like beam search, which expands on useful rules.
This strategy will only work if there is an intermediate reward or easily attainable goal. I just feel that the current strategy swamps the agent early on (which it does, and only lets up when slots/rules are found to be useless).