Progress: Further Expansion of Multiple Mini-States

The mini-states idea seems quite good, or at least better than the other ones. It needs a name. Perhaps Contoured Sub-States (CSS, taken already – twice. Oh well).

General Run-time Algorithm
When the program is running, the agent will need to:

  1. Choose its mode: Exploitation or Exploration.
  2. Exploitation
  1. Choose a successful state with the given piece.
  2. Rotate and move the piece to fit in the state properly.
  • Exploration
    1. Choose a random state.
    2. Choose a random rotation.
    3. Choose a random location above the state (out of max 4 locations).
  • Observe the reward and store information.
  • This will require several things:

    • The sub-states are saved.
    • Piece-value numbers need to be saved to those states (7 in total).
    • Within each state, the location and orientation values must be saved for each piece (48 in total).

    This results in each state having 48 extra bits of data affixed to it. Thus giving 16464 bits of data in total. Using the smaller state space (contours: {-2,…,2}) gives 6000. I’d like to start with the contour space of {-3,…,3} first though, and then modify it and check results. Note that only 48 data bits were there, not 55. This is because the 7 state wide bits of data can be gleaned from the 48 lower level bits.

    The performance of 16464 states vs. 6000 states needs to be measured because I am unsure of how long the agent is allowed to ‘think’ about decisions before the piece falls. This is likely explained in the requirements and documentation of the problem.

    A problem with CSS
    Due to the nature of CSS, a problem has arisen. Because it doesn’t look at the whole field and only at a small part of it, lines may not be able to be made in the game, thus netting score. For instance, an agent may only focus on one sub-state and create a large, thick skyscraper of blocks. This won’t help in Tetris, so the problem needs to be addressed.

    To stop this, states could be ignored if they are too far above the other states. For instance, the first state is recorded. Then the second state, which turns out to be N units above the bottom of the first state (lets use N=4). Then this second state is ignored as a viable state and the next state is looked at. This will not only reduce the state space for the agent to look at (not a big problem) but stop the ‘skyscraper effect’. The same goes for the first state. If it is high and the next state is N units lower, the first state is thrown away. Even if the first 4 states are high, then the 5th low, the first 4 are thrown away. Risky scenario in that situation, but we’ll see the results of it.

    A general Tetris problem
    When the stack gets high, the agent will have less time to move the piece about and rotate it. As the piece drops every time an action is chosen (perhaps even after t time also for slow thinking agents), the agent has limited time to get the piece how it needs to be. Using CSS, the agent could minimise the total number of states it has to look at and only look at the nearest ones given the amount of time for re-orientation.
    Hmmm, that works well. A pleasant side-effect of CSS.

    Progress: Better State Definition Method

    With the contour method giving a maximum of 40353607 (7^9) hideous states, something an agent could never work out, I decided I needed a better method of representation.

    If the maximum size of a piece in the game of (standard) Tetris is 4 units, then the state space need only be 4×4. So, with the contour method once more in this reduced state space give maximum 343 states (7^3). Using an explicit method of showing states (i.e. assigning every unit in the 16 squares a value of 1 or 0) gives 65536 states (2^16).

    The main problem with this approach is sub-dividing the field into 4×4 bits. The number of states within a field of 10 columns is 7 (Cols – Max Tetronimo Size + 1). The reason for this number is that many states will overlap one-another by at most 3 columns. Each of these sub-states may be at different heights as they would start from the lowest empty point (disregarding holes in the field) and capture the 4 rows above that point.

    Once a field has been sub-divided and its many states identified, the agent will have to choose one of the states (perhaps using the average reward gained from that state with the given piece) and guide the piece towards it. The agent will need to take into account the rotation of the piece and check all other rotations against the states.

    IMAGE LOST

    More problems and thoughts:

    • A 4×4 space may not be needed. Only 4×3 is required (XxY). Due to straight pieces not caring if it’s 3 or 4 units high. This makes no difference to the contour method though, so is not a big issue.
    • Problems with what an agent thinks is free. If an agent wants to put a piece partly on top of the state space, can it? For instance, if a state is represented as {3,0,-1} (A large climb of 3 or more units then flat then down 1 unit), can the agent put a 90 rotated S piece at the right end of the state? Yes. Can an agent put (incorrectly) a 180 rotated J piece at the left? Yes. An agent cannot assume that it can put a Z piece partly on the state horizontally though. That area is dealt to by the state to the right.
    • Should the state be taken from the top of the highest peak in the 4 cols or the depths of the lowest pit? Lowest would be ideal, usually, but experiments should be done on high peaks. Ideally playing, the agent would not end up with high peaks anyway. Still, need to accommodate for these scenarios.

    Combining point 1 and 2, I have realised that it would probably be best if the state space was 4×3. That way the contour system of cutting off at 3 would better suit. However, going by that logic, the contour system could be further reduced to {-2,…,2} (yielding max states of 125!) and still work relatively well. Only experimenting will solve this. Also, the agent could be restricted to placing pieces on top of a state space, but its learning should cover this.

    Progress: TFR Scrapped, Contour Introduced

    Hmm, after running some numbers using the TFR strategy on a field with 10 columns, the (max) number of states is 1.099511628 x 10^12 (2^40). Which is way too much. This could be done with TTR strategy, giving only 1048576 (2^20) states. This wouldn’t work well with L pieces and straight pieces.

    TFR could still be used if nothing else is available with a small amount of state space reduction. By removing any bottom rows from the 4 rows that aren’t required, the state space can be something else. For example:

    01111111    00000000
    11110111 = 00000000
    11110111 =/ 00000000
    10101111    01111111

    00010000    00000000
    00110000 = 00000000
    11101111 =/ 00010000
    10100110    00110000

    However, even this won’t be enough to sufficiently reduce the state space.

    Another strategy to take is using the contour method as seen here. The values will be all integers between (and including) {-3,…,3}. The contours are measured in relation to the current elevation of the preceding block. So the first column is taken as the base at the highest block and then numbers from {-3,…,3} are used to represent the differences in height of the blocks. Any difference outside the set is truncated to -3 or 3.
    This method will give a maximum of 40353607 (7^9) states with 10 columns (9 = 10 -1). 7 is the number of possible values.

    This strategy should be implemented before TFR strategy due to it’s higher chance of success.

    The contour strategy would also benefit from state space reduction which will be looked at later.