I’m still alive. Just swamped with other assignments. As I haven’t had a meeting or did any progress in over 2 weeks, I figured that at least something should be done.

Notes were scrawled at home in my book, and they will be uploaded in time. But for now, here are some notes from the lab:

– Roughly, every 11 MOVES results in a line (196 lines after 2200 moves = 11.22. 423 lines after 4658 moves = 11.01). Thus, the evaluating algorithm would want about 10 lines for evaluation. So give it 110 moves to do so.

– Also include the starting height and finishing height of the fields, or maybe a simple function without multipliers that (evaluates the finishing field – evaluates the starting field).

– Remember to factor in emergency play, which focuses on lowering the height and making lines as much as possible when the stack grows too big).

– Try queuing up multiple possible parameters for evalution. So a crossover of 2 parameter sets nets 2 (or more) outcomes – they all need to be tested.

Also, gonna try a proving run today. Without learning anything though. Just a static agent.

More notes to come…

Hand-scrawled notes:

Like the genetic algorithm, test various parameters that aren’t too different from the previous parameters. Record the results while testing.

Exploratory Strategy:

– In the early stages

- Choose a (bounded) random set of values for the multipliers (bounded: must not change +/- sign). Test it out on N steps and record results.

– In the later stages

- Combine the two best solutions (or maybe more) in a certain manner/s (crossover/midpoint/parameter trend) and try out the new solution/s.

Greedy Strategy

– Stick with the best parameter strategy. Can still record results though and learn a closer value while exploiting.

Storage

– Store multiplier parameters and average reward (total reward over N steps) / N. If re-storing, store whichever value has been judged from more steps. If step number is equal, store whichever is better.

New random parameter initialisation

– Probably best to try only mutating a single parameter from the source by * / ALPHA (ALPHA = say 2).

– The best mutations will be used more and create better play.

With all this info, the next version is clear. Create a genetically learning parameter set by keeping track of which parameters are best over a series of N *pieces* (moves can be too variable).