Progress: Results over all MDPs

Ran a modified consoleTrainer yesterday and found some results for all domains of Tetris.

More to come: I have class. Note that parameter 6 is particularly nasty.
Back. Here are the results for the various parameters (Note that the / is used to separate the runs):
Initial run / eTemp’d run / BETA = 10 / BETA = 20 / Slower Cooling / Looser Parameters / Midway Parameters
Parameter 0
Average lines: 2319 / 3009 / 2537 / 4724 / 694 / 2320 / 2680
Average reward: 2946 / 3820 / 3224 / 5861 / 913 / 2934 / 3399

Parameter 1
Average lines: 159 / 142 / 76 / 70 / 83 / 184 / 151
Average reward: 175 / 155 / 87 / 78 / 91 / 203 / 162

Parameter 2
Average lines: 106 / 123 / 117 / 135 / 49 / 126 / 122
Average reward: 136 / 156 / 145 / 148 / 62 / 155 / 158

Parameter 3
Average lines: 466 / 482 / 128 / 410 / 121 / 622 / 298
Average reward: 483 / 502 / 133 / 427 / 127 / 641 / 311

Parameter 4
Average lines: 381 / 212 / 281 / 245 / 136 / 259 / 292
Average reward: 404 / 229 / 298 / 261 / 145 / 274 / 310

Parameter 5
Average lines: 145 / 259 / 201 / 202 / 137 / 78 / 199
Average reward: 187 / 344 / 267 / 268 / 181 / 113 / 257

Parameter 6
Average lines: 44 / 56 / 50 / 55 / 48 / 54 / 36
Average reward: 62 / 81 / 69 / 77 / 68 / 74 / 50

Parameter 7
Average lines: 77 / 82 / 67 / 116 / 59 / 53 / 70
Average reward: 80 / 84 / 70 / 120 / 61 / 54 / 74

Parameter 8
Average lines: 918 / 666 / 1225 / 1145 / 258 / 1310 / 1189
Average reward: 969 / 716 / 1288 / 1199 / 277 / 1374 / 1250

Parameter 9
Average lines: 66 / 81 / 78 / 63 / 57 / 67 / 80
Average reward: 87 / 106 / 111 / 83 / 79 / 97 / 111

Parameter 10
Average lines: 161 / 130 / 149 / 139 / 60 / 108 / 144
Average reward: 227 / 176 / 201 / 193 / 88 / 152 / 203

Parameter 11
Average lines: 229 / 553 / 307 / 499 / 178 / 317 / 379
Average reward: 244 / 581 / 322 / 520 / 186 / 337 / 402

Parameter 12
Average lines: 35 / 63 / 72 / 63 / 57 / 70 / 69
Average reward: 40 / 70 / 77 / 68 / 61 / 75 / 74

Parameter 13
Average lines: 54 / 41 / 36 / 51 / 31 / 42 / 47
Average reward: 69 / 55 / 47 / 65 / 40 / 53 / 62

Parameter 14
Average lines: 32 / 62 / 45 / 49 / 43 / 59 / 56
Average reward: 34 / 64 / 47 / 51 / 45 / 62 / 59

Parameter 15
Average lines: 445 / 467 / 381 / 484 / 185 / 406 / 501
Average reward: 459 / 484 / 394 / 501 / 191 / 420 / 518

Parameter 16
Average lines: 2080 / 1427 / 1682 / 874 / 362 / 974 / 1414
Average reward: 2573 / 1762 / 2087 / 1090 / 473 / 1219 / 1757

Parameter 17
Average lines: 47 / 53 / 34 / 29 / 44 / 47 / 42
Average reward: 51 / 56 / 36 / 31 / 47 / 50 / 45

Parameter 18
Average lines: 652 / 452 / 453 / 444 / 167 / 569 / 547
Average reward: 673 / 474 / 474 / 458 / 176 / 595 / 573

Parameter 19
Average lines: 1163 / 799 / 989 / 766 / 217 / 711 / 790
Average reward: 1247 / 870 / 1066 / 837 / 241 / 767 / 864

These are actually quite disgusting results. I was about to blame them on small, hard to play training fields but nearly all of the results are terrible. I think the problem is either that the cooling rate cools far too fast (or maybe it isn’t reset at all!) or the initial parameter is too overfitting to the first MDP.

Fuck! I just found the problem. The eTemp isn’t reset at all. Dammit! Have to do them again!

Edit: Added in eTemp results. They’re practically the same, give or take some rewards. As I said in the (previous?) post, having eTemp in agent_init did nothing for these results.

Also added in BETA value results. They appear to have no large effect on the results. Possibly a negative effect, but that is purely speculation. I think perhaps the cooling rate needs to be adjusted.

Edit: Added in the lower cooling rate results. Absolutely disgusting. Not a single run was better. Perhaps try a higher cooling rate? Also added in the looser parameter tests. Not too bad. Roughly equal to regular style. Perhaps try and put the parameters halfway between the two parameters.

Edit: Added in midway parameter results. Doesn’t appear to be particularly different. These changes haven’t been drastic enough.