With mutation completed, and modularisation next, there needs to be a middleman of sorts. The fixed 1000 step CE update process is too rigid to allow the agent to split off into it’s own internal goal. So, the next step is to refactor the system such that the agent decides when to update. Note that this agent is not the same as the acting agent. In fact, this architecture will basically be an actor-critic architecture.
In any case, the critic controls what the actor is doing and observes the outcomes. The dynamicy (?) of the system is such that the number of episodes per iteration is dynamically adjusted towards the number of rules in the rulebase. In the first standard 100 random rule experiments, 1000 episodes were used (sure it was 1000 for the 30 rule experiments too). So, I propose that the population of each iteration is adjusted to 10 * the largest slot. Furthermore, the elites were made up of 5% of the samples. This could be modified too, to perhaps 10%, or 20-25% using weights (the lower samples have low update weight).
The proposed algorithm is as follows:
1) Critic continually runs actor until all LGGs are found, and the pre-goal has settled. Perhaps some form of guidance towards previously seen elite samples could be used here.
2) Once 1 is complete, begin the formal testing over iterations, recording all sample’s performance.
3) Once we have 10*max(slot.length) samples, begin the update process and post-update process (possibly creating new mutations).
4) If using a goal with constants, it is likely that modularisation will need to take place (unless we already have a module for the problem). If we mutate a constant fact, create/load module. Creation involves essentially a recursive creation of the entire system. More on this step later.
5) If, after X iterations, a slot remains roughly the same (same best rule, which only gets better), then fix it in place, and create a new slot, bar the best rule.
That should do it for now.