Another problem with the Ms. PacMan domain is the number of possible slots it can have. I haven’t even implemented slot freezing yet, but it starts with around 8 actions. The algorithm needs to find the optimal placing of these slots somehow, and the current system isn’t particularly good for that. Sure, it can find the best slot by increasing the weight of it, but that’s really it. It doesn’t say which one should always go last (or not at all!), nor does it say which one should always go somewhere in the middle.
This needs to be changed. A possible solution is to extend the probability distribution to be split into indexed distributions, under which all elements exist. If an element is added/removed, it is added/removed under all indexed distributions. An indexed probability distribution essentially works as an array of distributions.