Major Update: There was a bug in the code. Because the experiment needs to maintain two policies, only the agent-side one was being updated. This was fixed by passing the fired array back to the experiment from the agent.
The results for the test on using fired policies (only updating the rules that actually fired from the policy) seemed to return horrendous results. I have a feeling that these results are a result of a disasterous bug in the code, perhaps causing the opposite (non-firing) rules to be used in the updates. I’ll have to perform some tests to see. Unfortunately, due to some changed code, the policies weren’t output, but I doubt that they would have been able to help anyway.
Anyway, here’re the results for 25 episodes:
Average episode elite scores:
23476.865
13829.603
9238.333
5507.666
3088.8
2431.7334
1595.2666
943.6001
897.19995
864.0
836.46655
795.5999
897.7333
815.26666
834.1334
846.2668
806.5999
818.4666
805.0002
827.3334
791.8667
834.66656
828.3999
835.26666
824.6
As you can see, they get progressively worse. I’ll run a regular run now, to see if it’s a bug in the total code. Also, I need a hand-coded baseline to compare altered versions against.
Update: Looking at the generatedOutput file, it appears as though none of the distributions were updated at all. All that was updated was the policy slot values, which have dropped dramatically, causing policies to be empty. So there was no bug in policy output, the agent was simply playing completely randomly. Clearly, there is a problem in the firing code.