With rule selection out of the way, I move on to the next issue of address: slot addition/removal. In some problems, the same action may need to be performed more than once under different circumstances. For example, in Towers of Hanoi, all we can do is move a tile from A to B. But a single rule is not enough to solve the problem; in fact it will only get us 1, maybe 2 steps. And on the other side of the equation, for the sake of policy simplification, it is better to remove slots we never use than to have them hang around. Sure, in Ms. PacMan, every action has a use, and could potentially have a rule which works for it, but some actions are better left without (toGhostCentre, fromPowerDot for example).
And in a simple case, Blocks World has an action without use: moveFloor. Yes, it is used in modules, but when using modules, it is useless. So for the sake of these environments and possibly future environments, I will implement an optional slot addition/removal system.
The algorithm is something like this:
– Each slot maintains an M value for the base chance of it being used (initially M = 1, M >= 0) and an S value for variance (initially S = say 0.5, 0 < = S – The chance of a slot being used depends on a normally sampled value of m = M +- S.
– If m is above 1.0, the slot will be used but not removed from the slot distribution. The value for the not removed slot is equal to m – 1.0.
– If m is below 1.0, the slot will only be used (and removed from the distribution) if a random number is below m. Otherwise the slot is not used but is removed from the distribution.
For example, a slot has M = 1.0 and S = 0.5. It is drawn with m = 1.4. Hence, the slot is used at least once and a 40% chance of being used twice.
Another example, a slot is drawn with m = 2.3. The slot is used at least twice and a 30% chance of being used thrice.
Another example, a slot is drawn with m = 0.8. The slot only has an 80% chance of being used at all.
During the update procedure, these slot values M and S are updated as well, with M’s values being updated in a standard step-wise manner using the elite solution count. S’s value needs to slowly decrease over time if M remains relatively stable. Perhaps a standard deviation for the elite samples can be calculated and used to update S in a step-wise manner. That should suffice.