Well, they’re not all complete as of writing, but they should be in about 20 minutes.
The results for the last regeneration experiment:
Regeneration, Single, Random rules, 25 episodes:
Although the last few policies appear to have a higher disposition towards the TO_DOT and TO_FRUIT actions, they haven’t seemed to have helped much.
An extension experiment was the one using regeneration, individual, and fired rules from the policy. My rationale was that by only noting the rules that actually fire, the more useful conditions would be kept. Although, on further thought, this would likely have little effect on the results. For instance, the CONSTANT>=1 condition will always fire, but CONSTANT<1 will never. The firing rules will only note down the fact that CONSTANT was used, however.
Regeneration wt. Fired Policy, Individual, Random rules, 25 episodes:
These are better results, but perhaps it is just variance that is causing this. In order to find out, I’d have to run further experiments. Something of note is that the policy sizes in these experiments are much smaller. This is likely down to the firing rules strategy.
Something this regeneration strategy appears to need is the recording of more data. Merely noting down which conditions or actions are present in a rule isn’t enough. I need to note down their operators as well, and the index of their values if they are conditions. Because it is far more useful to turn an action on, than to turn it off. This entire experiment could simply be run by turning on actions at their appropriate priority levels, overwriting the older ones.
Graphing the results
I need to be able to compare the reintegration results with the regeneration results directly. Unfortunately, they were run on different domains, so I’ll have to re-run the reintegration experiments after my meeting.
But, having looked at the results for the random rules vs. regeneration and hand-coded vs. reintegration, they appear to display interesting curves. The curves could simply be down to variance, but it may be worth investigating.
The reintegration results for constant and decreasing appear to have a larger effect than without at the beginning of the experiment. But they eventually flatten out. Whereas the regeneration results seem to be indicating a trend for further growth past the 25 episode mark.
These theories are purely speculative, and may not be backed by rigourous testing. I haven’t yet put the regular random rules results up on the graph yet, so it could be a typical pattern for the experiment.
Hmmm… After putting the regular random results up, it’s hard to say what to make of it. The Individual regeneration strategy appears to be about as good as regular, and the fired policy regen strategy starts off performing no better than the other 2 regen strategies, but then jumps to the best performance.
Random rule results, 25 episodes: