PhD Progress: Occam’s Razor

Well, Ockham’s Razor, really. Or something. Anyway, the concept can apply to my cross entropy algorithm such that elite policy solutions with less rules are rewarded more than the same solution with more rules. This could be achieved by using an inverse function on the number of rules in the policy to weight the amount of update on the policy. However, we do still want to reward elite solutions, even if they have the max number of rules. So the function needs to perhaps only affect half of the update function.