Well, they’re not all complete as of writing, but they should be in about 20 minutes.
The results for the last regeneration experiment:
Regeneration, Single, Random rules, 25 episodes:
6494.667
7515.6
11342.466
13466.066
14755.133
15837.195
15969.732
16061.731
16405.2
17862.4
17656.469
16050.732
17832.201
17932.799
17153.133
17931.201
19750.797
18235.535
19097.4
23036.53
24929.938
26676.0
26230.064
29058.266
31211.797
Although the last few policies appear to have a higher disposition towards the TO_DOT and TO_FRUIT actions, they haven’t seemed to have helped much.
An extension experiment was the one using regeneration, individual, and fired rules from the policy. My rationale was that by only noting the rules that actually fire, the more useful conditions would be kept. Although, on further thought, this would likely have little effect on the results. For instance, the CONSTANT>=1 condition will always fire, but CONSTANT<1 will never. The firing rules will only note down the fact that CONSTANT was used, however.
Regeneration wt. Fired Policy, Individual, Random rules, 25 episodes:
7187.533
9216.132
12385.267
14501.6
14825.134
15785.005
16270.668
17321.533
17830.332
18766.932
17174.867
18249.469
17806.805
19836.0
18673.664
19052.402
19289.936
20720.861
22502.332
25285.4
26317.4
34856.863
34416.133
37655.53
37828.004
These are better results, but perhaps it is just variance that is causing this. In order to find out, I’d have to run further experiments. Something of note is that the policy sizes in these experiments are much smaller. This is likely down to the firing rules strategy.
Something this regeneration strategy appears to need is the recording of more data. Merely noting down which conditions or actions are present in a rule isn’t enough. I need to note down their operators as well, and the index of their values if they are conditions. Because it is far more useful to turn an action on, than to turn it off. This entire experiment could simply be run by turning on actions at their appropriate priority levels, overwriting the older ones.
Graphing the results
I need to be able to compare the reintegration results with the regeneration results directly. Unfortunately, they were run on different domains, so I’ll have to re-run the reintegration experiments after my meeting.
But, having looked at the results for the random rules vs. regeneration and hand-coded vs. reintegration, they appear to display interesting curves. The curves could simply be down to variance, but it may be worth investigating.
The reintegration results for constant and decreasing appear to have a larger effect than without at the beginning of the experiment. But they eventually flatten out. Whereas the regeneration results seem to be indicating a trend for further growth past the 25 episode mark.
These theories are purely speculative, and may not be backed by rigourous testing. I haven’t yet put the regular random rules results up on the graph yet, so it could be a typical pattern for the experiment.
Hmmm… After putting the regular random results up, it’s hard to say what to make of it. The Individual regeneration strategy appears to be about as good as regular, and the fired policy regen strategy starts off performing no better than the other 2 regen strategies, but then jumps to the best performance.
Random rule results, 25 episodes:
7459.1313
10125.933
11927.267
14470.334
16358.6
17745.4
18651.469
18242.266
18836.133
20196.332
20138.867
20114.732
21229.6
21675.332
23174.865
25090.467
29481.93
29963.867
31404.668
31850.662
31966.068
32551.664
33217.734
34160.867
32790.53