I left two experiments running overnight on my home computer to check my theory of performances against the submitted agent and the probabilistic agent.
The results shown are sort of what I expected but not entirely. Although I can save the raw(ish) performance data of each agent, it doesn’t save the number of episodes. Anyway the results:
Number steps run over: 836026
Total reward: 63753
Av Reward/step: 0.076257197
Number of episodes: 64
Av Steps/episode: 13063
Number steps run over: 592068
Total reward: 43547
Av Reward/step: 0.073550673
Number of episodes: 172
Av Steps/episode: 3442
As you can see, the probabilistic agent performed better, both in average reward/step and average steps/episode. However, in this competition, failing an episode has little effect on agent performance. I expected the probablistic agent to perform much better than the submitted agent, but it only seemed to be slightly better.
If both were run over 5 million steps at their current rates (although the submitted agent seemed to be increasing it’s performance according to the performance graph), their final results would be:
Probabilistic: 5000000 * 0.076257197 = 381286
Submitted 5000000 * 0.073550673 = 367753
These values seem much lower than what my submitted agent actually got in the testing performance graph. Of course, this could be because these are values taken from a single minimal reward MDP.
Once the experimenter is set up in my GUI, I can properly run a proving/testing run of the two agents against one another and see what comes up best. Also, if I get the class files from Loria INRIA team, I can test their agent (assuming I can get the load agent action to work properly).