Progress: Tests complete + Detailed todo list

Find piece rewards didn’t really need much changing for the variable states. Just needed to return a different value if the piece cannot be fit into a state on the field. So with a few changes, the tests are all complete.

There are still problems to address though. For instance, when storing the reward, instead of finding all relevant states, what could be returned is a single variable sized state created to fit the location of where the piece landed. So if a piece landed on a flat surface ({…,0,0,…}), only a substate {0} (or {0,0} depending on the size of the piece) needs to be sent to updatePolicy as it creates the necessary substates for maximal updating. Note that if this is used as the algorithm, every single state will be updated if a vertical I-piece is dropped. To counter this, I-pieces should return a flag stating that it is a vertical I-piece and should only be in special states. This will make the agent realise that the special states love vertical I-pieces and always strive to put them in there.
Note that this does not render findRelevantStates obsolete (yet?) as findPieceRewards still uses it.

Also, there is the problem of exploratory placement. This should be done with bias towards certain states (as stated above with the I-piece) so that the play is less random and likely to net a reward. This leads to a point raised in a previous post about trying to put pieces where they fit by mapping the piece contour to the field. This could be done so that the exploratory phase is really a less greedy phase where it tries to put the piece in one of these ideal locations with bias towards the lower locations.

Looking at the results of some tests, paving has shown to be a little bothersome. Due to the fact that paved states are over holes, the agent may learn that putting J-pieces on them is a good idea so that when an unpaved state with the same vector comes along, the agent puts a piece horribly. Part 2 of paved problems is that a paved state is over top of a special state so when the special state receives a reward, the paved state receives the reward also and also all superstates of such. Which may lead to I-pieces being put all over the place.
I think I jumped too hastily for implementing paved states as a cop out for dealing with special states. They may need to be removed and code for dealing with special states and non I-pieces put in. For instance, a J-piece would fit well in a left-wall special state.

Next up is dealing with states that encompass a large height. For instance {2,2,2}. Whether these states become a problem or not is yet to be seen, but I’m sure the agent would benefit by placing pieces at a lower point on the field.

Need to do a lab test with what I have now, but I doubt it will go well with all this stuff to do. Who knows? It may actually learn something.

Leave a Reply

Your email address will not be published. Required fields are marked *