Blog – Page 65 – Sam Sarjant

08.01.0826.09.11

Progress: Further Expansion of Multiple Mini-States

The mini-states idea seems quite good, or at least better than the other ones. It needs a name. Perhaps Contoured Sub-States (CSS, taken already – twice. Oh well).

General Run-time Algorithm
When the program is running, the agent will need to:

Choose its mode: Exploitation or Exploration.
Exploitation

Choose a successful state with the given piece.
Rotate and move the piece to fit in the state properly.

Exploration

Choose a random state.
Choose a random rotation.
Choose a random location above the state (out of max 4 locations).

Observe the reward and store information.

This will require several things:

The sub-states are saved.
Piece-value numbers need to be saved to those states (7 in total).
Within each state, the location and orientation values must be saved for each piece (48 in total).

This results in each state having 48 extra bits of data affixed to it. Thus giving 16464 bits of data in total. Using the smaller state space (contours: {-2,…,2}) gives 6000. I’d like to start with the contour space of {-3,…,3} first though, and then modify it and check results. Note that only 48 data bits were there, not 55. This is because the 7 state wide bits of data can be gleaned from the 48 lower level bits.

The performance of 16464 states vs. 6000 states needs to be measured because I am unsure of how long the agent is allowed to ‘think’ about decisions before the piece falls. This is likely explained in the requirements and documentation of the problem.

A problem with CSS
Due to the nature of CSS, a problem has arisen. Because it doesn’t look at the whole field and only at a small part of it, lines may not be able to be made in the game, thus netting score. For instance, an agent may only focus on one sub-state and create a large, thick skyscraper of blocks. This won’t help in Tetris, so the problem needs to be addressed.

To stop this, states could be ignored if they are too far above the other states. For instance, the first state is recorded. Then the second state, which turns out to be N units above the bottom of the first state (lets use N=4). Then this second state is ignored as a viable state and the next state is looked at. This will not only reduce the state space for the agent to look at (not a big problem) but stop the ‘skyscraper effect’. The same goes for the first state. If it is high and the next state is N units lower, the first state is thrown away. Even if the first 4 states are high, then the 5th low, the first 4 are thrown away. Risky scenario in that situation, but we’ll see the results of it.

A general Tetris problem
When the stack gets high, the agent will have less time to move the piece about and rotate it. As the piece drops every time an action is chosen (perhaps even after t time also for slow thinking agents), the agent has limited time to get the piece how it needs to be. Using CSS, the agent could minimise the total number of states it has to look at and only look at the nearest ones given the amount of time for re-orientation.
Hmmm, that works well. A pleasant side-effect of CSS.

08.01.0803.07.12

Progress: Better State Definition Method

With the contour method giving a maximum of 40353607 (7^9) hideous states, something an agent could never work out, I decided I needed a better method of representation.

If the maximum size of a piece in the game of (standard) Tetris is 4 units, then the state space need only be 4×4. So, with the contour method once more in this reduced state space give maximum 343 states (7^3). Using an explicit method of showing states (i.e. assigning every unit in the 16 squares a value of 1 or 0) gives 65536 states (2^16).

The main problem with this approach is sub-dividing the field into 4×4 bits. The number of states within a field of 10 columns is 7 (Cols – Max Tetronimo Size + 1). The reason for this number is that many states will overlap one-another by at most 3 columns. Each of these sub-states may be at different heights as they would start from the lowest empty point (disregarding holes in the field) and capture the 4 rows above that point.

Once a field has been sub-divided and its many states identified, the agent will have to choose one of the states (perhaps using the average reward gained from that state with the given piece) and guide the piece towards it. The agent will need to take into account the rotation of the piece and check all other rotations against the states.

IMAGE LOST

Progress: TFR Scrapped, Contour Introduced

Hmm, after running some numbers using the TFR strategy on a field with 10 columns, the (max) number of states is 1.099511628 x 10^12 (2^40). Which is way too much. This could be done with TTR strategy, giving only 1048576 (2^20) states. This wouldn’t work well with L pieces and straight pieces.

TFR could still be used if nothing else is available with a small amount of state space reduction. By removing any bottom rows from the 4 rows that aren’t required, the state space can be something else. For example:
01111111 00000000 11110111 = 00000000 11110111 =/ 00000000 10101111 0111111100010000 00000000 00110000 = 00000000 11101111 =/ 00010000 10100110 00110000

However, even this won’t be enough to sufficiently reduce the state space.

Another strategy to take is using the contour method as seen here. The values will be all integers between (and including) {-3,…,3}. The contours are measured in relation to the current elevation of the preceding block. So the first column is taken as the base at the highest block and then numbers from {-3,…,3} are used to represent the differences in height of the blocks. Any difference outside the set is truncated to -3 or 3.
This method will give a maximum of 40353607 (7^9) states with 10 columns (9 = 10 -1). 7 is the number of possible values.

This strategy should be implemented before TFR strategy due to it’s higher chance of success.

The contour strategy would also benefit from state space reduction which will be looked at later.