Growing nearer to the agent choosing greedily. Estimated time that all tests are passing and agent (should) be working (well or not, I don’t yet know): Tuesday 5th February.
A point of interest arose when writing the findPieceRewards function (a function which finds the reward associated with each position of the current piece to be used in finding a goal position). The reward associated with each piece can be one of 2 things:
– The max reward given to the pieces position as given by the policy which takes in a SubState and a Tetromino. note that a Tetromino can be encompassed by more than 1 SubState, if it is small enough.
– The sum of all rewards given to the pieces position from all encompassing SubStates.
After a bit of thought, the 2nd approach is not recommended. Though it does promote piece placement in good places, it is biased towards central states and small pieces. For example, a vertical I-piece could net 4 SubStates’ rewards whereas a horizontal nets only 1. Also, wall states, which I have previously stated are desirable, are limited to 1 SubStates’ reward.
What has been done:
Added 2 more functions (InitialisePieceLocations – Creates an array of sets of pieces for every location (in 1D) that a piece could be at. For use when choosing a goal position. FindPieceRewards – Finds the rewards associated with a piece. So it looks at all orientations of the piece and returns the rewards associated. For use with action choosing.) Tested.
Fixed up some problems with invalid piece checking.
Finished the greedyAction method, which works, in a very vague way (the test isn’t a great test). The test relies on updatePolicy, which is yet to be implemented. Next on my list.
Tests passing: 10/15