Progress: Update Policy SuperStating Working

Today I worked out the kinks in the updatePolicy method which updates the superstates of a state when updating. This will allow much faster learning when updating states of size less than 4.

Something to think about doing, is updating the smallest possible state the piece fits into. So if, say a T-piece lands on {-1,1,0} with the centre at 1.5 and rotated 90 (so it’s a snug fit), the only relevant part of the state is the {-1,1} bit. So, when a piece lands, only the relevant part of the state needs to be sent to updatePolicy and it will update that state and all superstates. This will result in very fast learning.

Find piece rewards still needs to be modified to look for variable sized states now. Once that is done, the program should be functional again and can be tested on the lab comps.

Tests passing: 19/20