Well, I can’t really say that they’re results, but interesting works-in-progress anyway. Normally, the experiments take about 1 hour per episode (thus a 25 episode run takes about a day), but these reintegration experiments using random rules still have a large amount of work to go.
The constant reintegration is at 84% after just under 25 hours, with an estimate 4 and a half hours to go. The delayed reintegration is at 70%, with an estimated 10 hours to go. The decaying reintegration however, is at 97%. Perhaps something in these reintegration strategies is working well with the random rules. I don’t recall how long the reintegration strategy took with the hand-coded rules, but I have a feeling it was the experiment that I left running over the weekend, making the runtime pointless.
A possible explanation for this is that the hand-coded ruleset already has all the rules available to it, while the random ruleset is, well random. But it has a massive database of rules, all learning concurrently, and these rules are being spread around among the slots, improving the overall performance.
It IS possible that the machines I am running the experiments on are simply running slowly, but only the final results will tell. I have a feeling that even variance won’t be able to explain away these possible performance boosts.
One of the policies from the constant reintegration clearly shows that the strategy is working:
[1]: if TO_FRUIT- then TO_CENTRE_OF_DOTS+
[1]: if NEAREST_GHOST>=21.0 and TO_POWER_DOT- then FROM_GHOST+
[1]: if MAX_JUNCTION_SAFETY>=19.0 and NEAREST_ED_GHOST>=10.0 then TO_ED_GHOST-
[1]: if TO_FRUIT- then TO_DOT+
[1]: if CONSTANT>=1.0 then FROM_GHOST+
[1]: if NEAREST_GHOST>=4.0 then TO_FRUIT+
[1]: if CONSTANT>=1.0 then FROM_GHOST+
[1]: if TO_FRUIT- then TO_DOT+
[1]: if TO_FRUIT- then TO_DOT+
[1]: if NEAREST_GHOST>=4.0 then TO_FRUIT+
[1]: if NEAREST_POWER_DOT<31.0 then TO_FRUIT-
[2]: if TO_CENTRE_OF_DOTS- then TO_DOT-
[2]: if TO_SAFE_JUNCTION+ then FROM_GHOST+
[2]: if TO_CENTRE_OF_DOTS- then TO_DOT-
[2]: if NEAREST_FRUIT>=2.0 then TO_DOT+
[2]: if MAX_JUNCTION_SAFETY>=2.0 then TO_DOT+
[2]: if FROM_GHOST+ then FROM_POWER_DOT+
[2]: if NEAREST_FRUIT>=2.0 then TO_DOT+
[2]: if NEAREST_FRUIT>=2.0 then TO_DOT+
[2]: if NEAREST_POWER_DOT<24.0 and TO_CENTRE_OF_DOTS+ then FROM_GHOST_CENTRE-
[2]: if TO_POWER_DOT+ then TO_SAFE_JUNCTION-
[2]: if TO_FRUIT- then TO_ED_GHOST+
[2]: if CONSTANT<1.0 and FROM_POWER_DOT+ then TO_CENTRE_OF_DOTS+
[2]: if TO_POWER_DOT+ then TO_ED_GHOST-
[3]: if NEAREST_GHOST>=4.0 and TO_ED_GHOST+ then KEEP_DIRECTION+
[3]: if FROM_GHOST+ then TO_FRUIT+
[3]: if DOT_CENTRE_DIST>=14.0 and DOT_CENTRE_DIST>=8.48528137423857 then TO_SAFE_JUNCTION+
[3]: if NEAREST_ED_GHOST<13.0 and NEAREST_GHOST<4.0 then TO_POWER_DOT-
[3]: if FROM_GHOST- then TO_POWER_DOT-
[3]: if CONSTANT<1.0 and NEAREST_DOT<12.0 then TO_FRUIT+
[3]: if TO_SAFE_JUNCTION+ then KEEP_DIRECTION-
[3]: if NEAREST_GHOST<10.0 then TO_SAFE_JUNCTION+
[3]: if NEAREST_GHOST<4.0 then TO_ED_GHOST-
[3]: if NEAREST_GHOST<10.0 then TO_SAFE_JUNCTION+
Note the multiple useful occurrances of ‘if TO_FRUIT- then TO_DOT+’, ‘if CONSTANT>=1.0 then FROM_GHOST+’ and others. Another benefit the reintegration strategy has over the (old) regeneration is that the rules are explicitly saved and won’t be subject to random boolean flips (only noting TO_DOT, rather than TO_DOT+).
I’ll come back tomorrow and see if my theories are correct. It certainly appears that way.