PhD Progress: PacMan Results and Fair Rule Sampling

Ran some PacMan experiments over the weekend (well, a PacMan experiment). After the refactoring, Ms. PacMan is running much faster. Also, there’s probably the fact that the experiments are level-limited.

Anyway, I had some results from previous runs, but there is little point in displaying their graphs. Suffice to say, early convergence (where total update size is 1 * the alpha update) and low population constants (10) are not the best for learning.

Here are some results from an agent learning with a convergence value at 0.1 (which seems reasonable, given the graph) and a population constant of 30. The selection ratio was set at 0.1, but perhaps could be even lower with a greater population constant.
IMAGE LOST

Something else I need to experiment with is a more precise measure of population constants and testing amount. At the moment, the experiment uses the pop const, which can result in small elite update sizes which may not be ideal. To fairly test every rule in the policy generator, there needs to be an equal chance for every rule to be included in a meta-iteration (preferably earlier than later). This will somehow relate to the maximum value for every slot’s rules, where the value of a slot is given by the number of rules squared. This gives every rule an equal chance:

For example, a slot has 3 rules, all initially at 0.33 probability. Therefore, assuming Bernoulli sampling, to fairly sample each rule would require 1/0.33 = 3 samples per rule, totalling 9 samples altogether. Givena fair distribution, each rule will have been sampled 3 times each, but if not, they were at least iven a fair chance.

The reasoning behind this is that sometimes a rule simply has no chance of being re-recognised as a useful rule. Say a rule has a sampling probability of 0.1 in a slot of x rules. Odds are, it will only be sampled one in 10 times, giving it a placement in the elite solutions sample of 1 for every 10 samples. Say the elite solution size is of 10, then the rule will only ever average with a sample distribution value of 0.1, meaning it can never grow unless it is sampled more often through random sampling. I think I might be off in a few areas here, but the system makes sense to me.

Granted, using the square of a slot’s size can get big, the pruning mechanic should keep the slot sizes low.