Progress: Further Expansion of Multiple Mini-States

The mini-states idea seems quite good, or at least better than the other ones. It needs a name. Perhaps Contoured Sub-States (CSS, taken already – twice. Oh well).

General Run-time Algorithm
When the program is running, the agent will need to:

Choose its mode: Exploitation or Exploration.
Exploitation

Choose a successful state with the given piece.
Rotate and move the piece to fit in the state properly.

Exploration

Choose a random state.
Choose a random rotation.
Choose a random location above the state (out of max 4 locations).

Observe the reward and store information.

This will require several things:

The sub-states are saved.
Piece-value numbers need to be saved to those states (7 in total).
Within each state, the location and orientation values must be saved for each piece (48 in total).

This results in each state having 48 extra bits of data affixed to it. Thus giving 16464 bits of data in total. Using the smaller state space (contours: {-2,…,2}) gives 6000. I’d like to start with the contour space of {-3,…,3} first though, and then modify it and check results. Note that only 48 data bits were there, not 55. This is because the 7 state wide bits of data can be gleaned from the 48 lower level bits.

The performance of 16464 states vs. 6000 states needs to be measured because I am unsure of how long the agent is allowed to ‘think’ about decisions before the piece falls. This is likely explained in the requirements and documentation of the problem.

A problem with CSS
Due to the nature of CSS, a problem has arisen. Because it doesn’t look at the whole field and only at a small part of it, lines may not be able to be made in the game, thus netting score. For instance, an agent may only focus on one sub-state and create a large, thick skyscraper of blocks. This won’t help in Tetris, so the problem needs to be addressed.

To stop this, states could be ignored if they are too far above the other states. For instance, the first state is recorded. Then the second state, which turns out to be N units above the bottom of the first state (lets use N=4). Then this second state is ignored as a viable state and the next state is looked at. This will not only reduce the state space for the agent to look at (not a big problem) but stop the ‘skyscraper effect’. The same goes for the first state. If it is high and the next state is N units lower, the first state is thrown away. Even if the first 4 states are high, then the 5th low, the first 4 are thrown away. Risky scenario in that situation, but we’ll see the results of it.

A general Tetris problem
When the stack gets high, the agent will have less time to move the piece about and rotate it. As the piece drops every time an action is chosen (perhaps even after t time also for slow thinking agents), the agent has limited time to get the piece how it needs to be. Using CSS, the agent could minimise the total number of states it has to look at and only look at the nearest ones given the amount of time for re-orientation.
Hmmm, that works well. A pleasant side-effect of CSS.