While implementing the new splitField() method (which is no mean feat!) I realised more problems.

– Firstly, the agent is capturing states that are already encompassed by a bigger state. This problem is solvable – I can feel it. It just requires the right algorithm. But I’m puzzled by it now.

– Secondly, by using variable states, I increase the state number and thus the time it takes for an agent to learn. However, because every state of size 4 – n fits into size 4 states, multiple updates can be done (size 4 – n state fits into (4 – n + 1) * (5^(4-n)) size 4 states). For instance, a size 3 state {1, 2, null} fits into {1, 2, any} and {any, 1, 2} size 4 states which is a total of 10 states (any can be 1 of 5 numbers). A size 2 state {0, null, null} fits into 75 size 4 states. This makes state reward storing more efficient, but also more time-consuming. However, it’s definitely worth it due to the faster learning rate.

A helluva task, and I expect it to take some time to incorporate these new changes. Perhaps a week or 2. A bit of a set back, but should work out for the best.