After running a ‘learn the best’ run (set eTemp to 0.5 with no cooling), the results may be a bit more realistic, even if they look disappointing. The problem lies in the fact that the other fields are nearly all smaller or have a less balanced proportion of pieces than the ‘fair’ first MDP which emulates Tetris faithfully.
During and after the ‘learn the best’ run on MDP 14 (7 wide), the parameter didn’t wildly change or invert or anything, and the average number of lines was roughly equal to the results obtained.
Nonetheless, I am currently running a lower cooling rate to see if the agent needs more exploratory time.
Edit: Gah. After another LTB run, the results came out 2-3 times better. On MDP 6, the average lines made from the runs is about 50. The LTB run netted an estimated 160 lines. Perhaps the lower cooling rate will help. Otherwise, I’m gonna have to look at a different default parameter.
Edit again: Cooling run finished on the first MDP and the results are much worse. I’ll leave a lower cooling run on overnight and check overall results. Also, I’ll leave a less defined default parameter run on overnight too (normal cooling).