Have completed work on eligibility trace with length of 10. Should theoretically work as it has been tested and works via JUnit. Can’t test in the labs until Wednesday though. Maybe Tuesday if I’m lucky.
As a result of adding eligibility traces, the agent shoudl now place pieces better when running exploitatively.
I guess this means Version 1.3 is complete, but it feels like too little work to call a version. Depends on the lab results I guess.
Also experimenting in advance with no culled states. This is likely to be a bad move without guidance and I predict it will end badly when the agent chooses to go exploitatively. Need to add code that bias’ results towards lower states for greedy choosing.
Still to do:
– During exploration, bias pieces to place themselves horizontally to take up maximum area and reduce jaggedness of terrain.
– Perhaps experiment with an alternative method of play. This would be a drastic move and a big change, but the results of TD-Gammon speak for themselves. The idea is to change the reward storage for play. When a piece is introduced, it is evaluated at every position and the likely reward is returned by taking into account factors such as terrain bumpiness, height, afterstate, etc. This method of looking ahead and seeing that a particular move is good is quite smart. Bernhard suggested it as a possible alternative if things turn pear-shaped.
– As an offshoot of that idea, need to implement whether a piece will create a line into the current code.
– Need to experiment with multiple parameters for optimal learning.