At the lab some problems were noted:
– The agent tends to make the majority of game-breaking mistakes when doing things exploitatively. It shouldn’t as the code suggests that it should follow good paths, but either the code is wrong for doing so or the rewards aren’t being stored appropriately.
– The reward given is more than just 1 sometimes. When multiple lines are completed, a reward of 3, 7 or 16 even has been seen.
– The agent also likes to create chasms. Not actively, but the code it is based on is written for avoiding potential chasms and making them chasms by stacking pieces up beside them. This can be remedied by looking at afterstates. Doing such a thing changes the exploratory strategy somewhat, but guides the agent more with forethought. An evaluation function for the afterstates (or even current states) needs to be written to test which piece placement will be the best.
This can be easily achieved for the most part because the piece will fall in a specific place and the new contour can be calculated and split into substates. These substates – together – can be evaluated by a function that determines the worth of the field. So a field made up of 3 substates (out of a possible 7), is likely to be bad as it is split up quite a bit. This could be done with just the predicted contour field, but substates are easier to judge by number of substates and types of substates.
The factors in which a field can be evaluated on (using on the height contours, so holes and overall height aren’t looked at) are:
– Bumpiness of substates
– Number of substates
– Not sure what else. Need to think more on this and look at related papers.
All of this is sort of taking a step towards evaluation function reinforcement learning which is a good thing as it enables an easy transition if I decided to change track.
Also, as a side note, the best number of lines I have seen my agent get so far is about 50.
Edit: Holes in the field should be incorporated in state worth. A flat field full of holes is of equal value to an empty field. Also, height of field should be incorporated as well. Also, number of substates doesn’t need to be included as a field will be determined bad if it is quite bumpy anyway (leading to less substates).