Progress: Line completion and Other Stuff

Quick update.

After playing Tetris, I realised that the way people play Tetris usually involves making lines whenever possible. This means that the agent would benefit from knowing if a gap that can be filled is the last gap in the line. This would require another variable stating whether the row is nearly full.

I feel that the strategy would work whether or not this variable was there, but the variable would make the agent better. Something for investigation.

Edit: The agent would also benefit from putting pieces next to walls when in the first stages of the game (or any stage with a flattish surface).

Edit: Playing the game you are hoping to create something for makes it so much more involving! Placing pieces next to walls and to complete lines should have higher artificial rewards. By artificial, I mean these rewards would temporarily influence the agent to put the piece there, depending on the playing field at the time.

Progress: Further Expansion of Multiple Mini-States

The mini-states idea seems quite good, or at least better than the other ones. It needs a name. Perhaps Contoured Sub-States (CSS, taken already – twice. Oh well).

General Run-time Algorithm
When the program is running, the agent will need to:

  1. Choose its mode: Exploitation or Exploration.
  2. Exploitation
  1. Choose a successful state with the given piece.
  2. Rotate and move the piece to fit in the state properly.
  • Exploration
    1. Choose a random state.
    2. Choose a random rotation.
    3. Choose a random location above the state (out of max 4 locations).
  • Observe the reward and store information.
  • This will require several things:

    • The sub-states are saved.
    • Piece-value numbers need to be saved to those states (7 in total).
    • Within each state, the location and orientation values must be saved for each piece (48 in total).

    This results in each state having 48 extra bits of data affixed to it. Thus giving 16464 bits of data in total. Using the smaller state space (contours: {-2,…,2}) gives 6000. I’d like to start with the contour space of {-3,…,3} first though, and then modify it and check results. Note that only 48 data bits were there, not 55. This is because the 7 state wide bits of data can be gleaned from the 48 lower level bits.

    The performance of 16464 states vs. 6000 states needs to be measured because I am unsure of how long the agent is allowed to ‘think’ about decisions before the piece falls. This is likely explained in the requirements and documentation of the problem.

    A problem with CSS
    Due to the nature of CSS, a problem has arisen. Because it doesn’t look at the whole field and only at a small part of it, lines may not be able to be made in the game, thus netting score. For instance, an agent may only focus on one sub-state and create a large, thick skyscraper of blocks. This won’t help in Tetris, so the problem needs to be addressed.

    To stop this, states could be ignored if they are too far above the other states. For instance, the first state is recorded. Then the second state, which turns out to be N units above the bottom of the first state (lets use N=4). Then this second state is ignored as a viable state and the next state is looked at. This will not only reduce the state space for the agent to look at (not a big problem) but stop the ‘skyscraper effect’. The same goes for the first state. If it is high and the next state is N units lower, the first state is thrown away. Even if the first 4 states are high, then the 5th low, the first 4 are thrown away. Risky scenario in that situation, but we’ll see the results of it.

    A general Tetris problem
    When the stack gets high, the agent will have less time to move the piece about and rotate it. As the piece drops every time an action is chosen (perhaps even after t time also for slow thinking agents), the agent has limited time to get the piece how it needs to be. Using CSS, the agent could minimise the total number of states it has to look at and only look at the nearest ones given the amount of time for re-orientation.
    Hmmm, that works well. A pleasant side-effect of CSS.

    Progress: Better State Definition Method

    With the contour method giving a maximum of 40353607 (7^9) hideous states, something an agent could never work out, I decided I needed a better method of representation.

    If the maximum size of a piece in the game of (standard) Tetris is 4 units, then the state space need only be 4×4. So, with the contour method once more in this reduced state space give maximum 343 states (7^3). Using an explicit method of showing states (i.e. assigning every unit in the 16 squares a value of 1 or 0) gives 65536 states (2^16).

    The main problem with this approach is sub-dividing the field into 4×4 bits. The number of states within a field of 10 columns is 7 (Cols – Max Tetronimo Size + 1). The reason for this number is that many states will overlap one-another by at most 3 columns. Each of these sub-states may be at different heights as they would start from the lowest empty point (disregarding holes in the field) and capture the 4 rows above that point.

    Once a field has been sub-divided and its many states identified, the agent will have to choose one of the states (perhaps using the average reward gained from that state with the given piece) and guide the piece towards it. The agent will need to take into account the rotation of the piece and check all other rotations against the states.

    IMAGE LOST

    More problems and thoughts:

    • A 4×4 space may not be needed. Only 4×3 is required (XxY). Due to straight pieces not caring if it’s 3 or 4 units high. This makes no difference to the contour method though, so is not a big issue.
    • Problems with what an agent thinks is free. If an agent wants to put a piece partly on top of the state space, can it? For instance, if a state is represented as {3,0,-1} (A large climb of 3 or more units then flat then down 1 unit), can the agent put a 90 rotated S piece at the right end of the state? Yes. Can an agent put (incorrectly) a 180 rotated J piece at the left? Yes. An agent cannot assume that it can put a Z piece partly on the state horizontally though. That area is dealt to by the state to the right.
    • Should the state be taken from the top of the highest peak in the 4 cols or the depths of the lowest pit? Lowest would be ideal, usually, but experiments should be done on high peaks. Ideally playing, the agent would not end up with high peaks anyway. Still, need to accommodate for these scenarios.

    Combining point 1 and 2, I have realised that it would probably be best if the state space was 4×3. That way the contour system of cutting off at 3 would better suit. However, going by that logic, the contour system could be further reduced to {-2,…,2} (yielding max states of 125!) and still work relatively well. Only experimenting will solve this. Also, the agent could be restricted to placing pieces on top of a state space, but its learning should cover this.