With the contour method giving a maximum of 40353607 (7^9) hideous states, something an agent could never work out, I decided I needed a better method of representation.
If the maximum size of a piece in the game of (standard) Tetris is 4 units, then the state space need only be 4×4. So, with the contour method once more in this reduced state space give maximum 343 states (7^3). Using an explicit method of showing states (i.e. assigning every unit in the 16 squares a value of 1 or 0) gives 65536 states (2^16).
The main problem with this approach is sub-dividing the field into 4×4 bits. The number of states within a field of 10 columns is 7 (Cols – Max Tetronimo Size + 1). The reason for this number is that many states will overlap one-another by at most 3 columns. Each of these sub-states may be at different heights as they would start from the lowest empty point (disregarding holes in the field) and capture the 4 rows above that point.
Once a field has been sub-divided and its many states identified, the agent will have to choose one of the states (perhaps using the average reward gained from that state with the given piece) and guide the piece towards it. The agent will need to take into account the rotation of the piece and check all other rotations against the states.
IMAGE LOST
More problems and thoughts:
- A 4×4 space may not be needed. Only 4×3 is required (XxY). Due to straight pieces not caring if it’s 3 or 4 units high. This makes no difference to the contour method though, so is not a big issue.
- Problems with what an agent thinks is free. If an agent wants to put a piece partly on top of the state space, can it? For instance, if a state is represented as {3,0,-1} (A large climb of 3 or more units then flat then down 1 unit), can the agent put a 90 rotated S piece at the right end of the state? Yes. Can an agent put (incorrectly) a 180 rotated J piece at the left? Yes. An agent cannot assume that it can put a Z piece partly on the state horizontally though. That area is dealt to by the state to the right.
- Should the state be taken from the top of the highest peak in the 4 cols or the depths of the lowest pit? Lowest would be ideal, usually, but experiments should be done on high peaks. Ideally playing, the agent would not end up with high peaks anyway. Still, need to accommodate for these scenarios.
Combining point 1 and 2, I have realised that it would probably be best if the state space was 4×3. That way the contour system of cutting off at 3 would better suit. However, going by that logic, the contour system could be further reduced to {-2,…,2} (yielding max states of 125!) and still work relatively well. Only experimenting will solve this. Also, the agent could be restricted to placing pieces on top of a state space, but its learning should cover this.