Progress: TFR Scrapped, Contour Introduced

Hmm, after running some numbers using the TFR strategy on a field with 10 columns, the (max) number of states is 1.099511628 x 10^12 (2^40). Which is way too much. This could be done with TTR strategy, giving only 1048576 (2^20) states. This wouldn’t work well with L pieces and straight pieces.

TFR could still be used if nothing else is available with a small amount of state space reduction. By removing any bottom rows from the 4 rows that aren’t required, the state space can be something else. For example:

01111111    00000000
11110111 = 00000000
11110111 =/ 00000000
10101111    01111111

00010000    00000000
00110000 = 00000000
11101111 =/ 00010000
10100110    00110000

However, even this won’t be enough to sufficiently reduce the state space.

Another strategy to take is using the contour method as seen here. The values will be all integers between (and including) {-3,…,3}. The contours are measured in relation to the current elevation of the preceding block. So the first column is taken as the base at the highest block and then numbers from {-3,…,3} are used to represent the differences in height of the blocks. Any difference outside the set is truncated to -3 or 3.
This method will give a maximum of 40353607 (7^9) states with 10 columns (9 = 10 -1). 7 is the number of possible values.

This strategy should be implemented before TFR strategy due to it’s higher chance of success.

The contour strategy would also benefit from state space reduction which will be looked at later.

Progress: Initial Research and TFR

Started work on a strategy for tackling the Tetris problem. Got my first idea from another reduced version of Tetris which I can use to make my agent. Calling it the Top-Four-Rows (TFR) Strategy. This strategy cannot implement the symmetry that Top-Two-Rows (from Bdolah & Livnat’s Tetris) does as their Tetris have symmetrical pieces that can rotate to match one another due to the pieces being centrally symmetrical. I will have to find a way to reduce the state space for faster calculations.

The method involves looking at the top four rows of the playing field. This is the only relevant area for pieces of max 4 units and if the agent is playing well, it should always be the area to tackle. By the top four rows, I mean the highest row containing a block on the field and the 3 rows under that or the bottom 4 rows if the field is too low.

I foresee some problems with this method, such as having 5 rows containing pits and only the 4 rows are looked at, however, this could be changed once a line is made. Also at the highest stage of blocks (the blocks are stacked to the near top), this could present problems.

As a thought, the agent should change strategy/learn faster when the field grows too full. Something to look at once the TFR is implemented.

Another strategy could be a simple SARSA learning algorithm using Boltzmann selection.

Another idea to keep in mind is eligibility traces. Can’t quite see how it fits in yet, but I’ll keep it in the back of my mind.