I ran a performance test over a day or so and the results given show that the three agents tested (standard, probabilistic and ASA) performed roughly equally. Interesting results, given that one-piece look-ahead should have an advantage over the other two.
The clinch is that the probabilistic did do better, but not in overall reward gained. The number of episodes that the other two did was much bigger than the number of episodes the probabilistic agent went through. So if playing to win, probabilistic would be better. But if playing to score big, the other two would be adequate.
This leads into a possible other performance comparison: steps/episode. Shouldn’t be too tricky to keep track of. Although it will result in a larger performance file, so that will need reworking (and the old ones disposed of).
Experiment mode is mostly finished on the visual side, but hasn’t yet got any interaction. Should be done by Thursday I hope. Wednesday would be better ’cause then I could start some experiments.