I implemented probability look-ahead play today and aam currently testing it. The biggest problem is that it takes *much* longer to do stuff.

The algorithm is like this:

– The current piece is evaluated in every position as per normal but its value is now made up of fractions of field values after the next piece has dropped. The fractions come in as 1/7 (7 pieces) or whatever the agent has observed the field as.

The problem here is that with the old strategy, the current piece was in every possible position (N). However, this new strategy does that and again with the next 7 pieces. So the new algorithm now takes N*N*7. Where N = 17, the original strategy is 17 steps, whereas the new strategy is 2023 steps. That’s 119 times slower than the original algorithm.

This strategy is likely to get better results, but it takes too long to be useful. In 2-piece Tetris, it will be very useful, because the agent *knows* what the next piece will be.

So this will not be further explored until after the competition.

The next idea is now to use the y-height of the field to determine if a position is game ending.