After looking around at my competitors details and personal websites, I realised that my competition was fierce. So, partly giving myself some credit for coming 5th against these guys, and partly finding stuff to put into my Final Report, I am listing my competitor details.
- Loria INRIA – MAIA
- Bruno Scherrer – Senior researcher, written many publications
- Christophe Thiery – Unknown. Possibly been around for a while
- Amine Boumaza – Engineer Expert.
- The Cobras
- Jeff Johns – Researcher, written several publications
- Colin Barringer –
- Marek Petrik – MS/PhD student
- Eotvos Lorand University (I used a paper (noisy cross entropy) from these two)
- Andras Lorincz – Head Senior Researcher. Over 200 publications. I actually used one of his in my Interim Report.
- Istvan Szita – PhD student.
- Can’t find much. Belgium guy, 20 year old. Likely in the same boat as me.
With some of the larger teams, it’s likely that one of the members were only supervisors and provided ideas and help, but who knows. Anyway, apart from ‘Cryptoxic’, I was up against some tough competitors. There were others as well, but these were the ones who performed better.
From what I’ve gathered, the Cobras and Loria used the same style of strategy as I – using features. I’m assuming many other teams did as well. The Cobras however, tell me that they used ‘a very simple 1-step lookahead with a linear combination of a fixed set of features. The weights for each feature were learned using a policy search algorithm.’
This is interesting because when I trialled out my probabilistic method of play, it took too long to run. Because of the time taken, I elected not to use it (that and it hadn’t been properly tested, so it may have been worse). Perhaps if I had used it, I may have placed better. Only performance comparisons will show this. It’d be a shame if it was a winning entry. I doubt it however.
I think I found Loria’s presentation about their method, although it’s dated at October 2007, which means they must’ve formulated this before the competition started. After talking to Bernhard about their presentation at ICML, I believe I have found their feature list. On slide 26, they show a table that outlines 4 base feature types. Bernhard however, said that they had about 20-25 or so features. The table shows that there are individual features for each column (individual column height) and each adjacent column pair (absolute adjacent column height differences). As well as this is the maximum wall height (max height, I guess) and number of holes. That gives, on a standard Tetris field of 10 columns, 21 features. Bernhard mentioned something about them having to add in a few extra features in the last few weeks because of competition.
He also mentioned that they used pre-processing to give an initial boost based on the initial task_spec. So, perhaps a feature set S1 performs well on fields of size x1 by y1 but S2 is used on fields of x2 by y2. More research could go into this to mathematically create feature sets based purely on the field size which are later mutated to accommodate piece distributions and reward exponents.
And another thing the Loria guys did, now I remember it, was to tailor their agent to the restrictions and rules of the competition. Where my agent focused on just playing Tetris well in an infinite game, their agent focused on playing it well in a finite game, and making sacrifices if a piece took too many steps to place. What I mean is, their agent would consider simply placing a piece in a closer location in the middle that wasn’t ideal, but took fewer steps to get to than the ideal piece placement. That way, the agent could place more pieces faster and receive more rewards in the finite number of steps. Also, something was mentioned about suicide if the pile got too high, because there is no penalty for losing, so a fresh slate could be claimed. That rule annoyed me, but those were the rules, and I need to accept them.
Looking at both of my competitors strategies, I might be able to create a better agent that uses both benefits (Individual column features and 1-step lookahead). Firstly though, I need to create a custom GUI interface (optional GUI, I should say for speed) that can run several agents against each other (simultaneously or iteratively. It shouldn’t matter) and output real-time or general results as a graph. The graph could have different scales: overall reward, or reward per step.
Was meant to do some XNA today, but the labs are sealed off to me. I’ll check ’em out again after this post.