Lack of progress lately is due to having to be at the flat for my coffee table and me being at Dal’s parents for the night. And it’s too damn hot to trek to uni…
Notes I took down after leaving the lab on Monday:
Tracking a piece
Given the first observation, we know exactly where a piece is, I think. This depends if the pieces can fall from different places for different MDPs. That would be a good thing to throw at us during proving, so I’ll accommodate for it.
Best to capture piece by having default first movement (for the episode) as idle. Then use 2nd observation to see where piece is (by comparing against the first) and using knowledge of what piece is where for the rest of the problem.
Basic pseudocode for program
/* Private members */
Action action_;
Observation prevObservation_;
Tetronimo piece_;
int rows_;
int columns_;
TaskSpecObj TSO_;
Bimap
SubState goalState_;
Tetronimo goalPosition_;
boolean frozen_ = false;
agent_init(String taskspec) {
TSO_ = parseSpec(taskSpec);
action_ = new Action(*blah*, *blah*);
rows_ = TSO_.rows();
columns_ = TSO_.cols();
policy_ = initPolicy();
}
agent_start (Observation o) {
action_ = Actions.idle;
prevObservaton_ = o;
return action_;
}
agent_step (Observation o, double reward) {
if (!frozen_) {
if (pieceLanded()) {
updatePolicy(goalState_, goalTetronimo, reward);
piece_ = null;
goalState_ = null;
goalTetronimo_ = null;
}
}
if (piece_ == null) {
piece_ = findPiece(o);
contourField = contourScan(o);
substates = splitField(contourField);
}
if (frozen) action_ = greedyAction(substates);
else action_ = pickAction(substates);
prevObservation_ = o;
return action_;
}
Note that this is pseudocode and not exactly correct. However, it captures the basic algorithm.