While it may prove to be fruitless, I had an idea for a more dynamic CE algorithm.
The system uses a shifting window for finding the ideal rules, which first eliminates rules when they are ineffective and then regenerates new rules from the leftovers.
The window size is of size R, where R is equal to the number of rules. When there are R samples, the algorithm begins, by removing all rules used in the X*R worst policies and updating the generators for the rules used in the best X*R policies (but with an update alpha reduced, due to the smaller rapid updates). This results in the window begin shifted by X*R policies (newly generated from the updated generators). And the process repeats.
There is a big problem of settling on local optima if the alpha value is too big, as it will cause any further rules to be the same. So it is very important to have a small alpha.
The second part of this is the regeneration/mutation operators, which create new rules to replace those that have been deleted. Perhaps these should only kick in when the rule base is half it’s size. Another parameter for optimisation.
If this algorithm is successful, it greatly reduces the time it takes for the algorithm to complete and provides faster feedback.