The results for the test on using fired policies (only updating the rules that actually fired from the policy) seemed to return horrendous results. I have a feeling that these results are a result of a disasterous bug in the code, perhaps causing the opposite (non-firing) rules to be used in the updates. I’ll have to perform some tests to see. Unfortunately, due to some changed code, the policies weren’t output, but I doubt that they would have been able to help anyway.
Anyway, here’re the results for 25 episodes:
Average episode elite scores:
As you can see, they get progressively worse. I’ll run a regular run now, to see if it’s a bug in the total code. Also, I need a hand-coded baseline to compare altered versions against.
Update: Looking at the generatedOutput file, it appears as though none of the distributions were updated at all. All that was updated was the policy slot values, which have dropped dramatically, causing policies to be empty. So there was no bug in policy output, the agent was simply playing completely randomly. Clearly, there is a problem in the firing code.