Bah. Stupid website. Losing my draft…
Here’re the results for the last 3 experiments:
Reintegration, Delayed, Hand-coded, 25 episodes:
22538.332
37453.19
41644.734
43862.6
48394.805
50742.34
57591.4
59532.926
62933.41
68106.34
68237.54
70581.33
68203.87
72322.195
71891.66
76311.27
75801.14
79410.805
80507.33
81810.01
83625.88
87125.48
89817.22
87375.06
89742.81
These results have no advantage over the others. If anything, they actually do slightly worse. However, to get proper results for this strategy, I’d need to run it longer. But I doubt it would change things.
Regeneration, Individual, Random rule, 25 episodes:
7382.333
8927.801
11877.067
13024.336
15666.735
15498.87
16414.396
17349.6
17398.13
19362.467
19277.268
19673.865
23126.533
24377.262
28746.46
28005.465
31515.6
28781.865
32144.93
33061.87
33569.0
35274.26
35603.004
35700.594
35248.793
Regeneration, Policy, Random rule, 25 episodes:
6921.601
9167.467
12176.068
12847.331
14771.935
15334.799
15836.264
15945.733
17466.264
17415.998
17165.865
17435.602
19159.537
17705.002
20058.467
19008.197
18902.137
18979.002
18994.73
20221.865
23350.67
23401.402
22751.47
26970.127
28565.133
Looking at the results graphically, the individual rule bases seem to have a clear advantage over the policy based ones.
There may be a problem with these experiments, because the regeneration strategy uses all the rules in a policy or condition and action generator updates. If this regeneration was combined with fired policy, it may perform better.
The next three experiments (I can only run 3 at a time) will be the last regeneration, regular random rules (for something to compare against), and the above case with individual rules.