There are still many things to do for this Relational Cross Entropy system. It still hasn’t ‘recovered’ yet from the shift up to the relational domain, notably in the file saving/loading.
- Implement saving/loading rules. Preferably in a human readable format as well. Perhaps utilise an XML style of format. See line 322 in BlocksWorldStateSpec for an example of rule format.
- Implement saving/loading generators. Also, would be good in a human readable format. Such that rules with a non-zero prob are only mentioned as possibilities. Maybe output two generator files: One with the actual generator values, and the other with the rounded/bound values.
- Check the experiment ETA measure is running correctly. I don’t think it is properly working.
- Normalise the blocks world scores. By this, I mean give the agent a reward of 0 if it plays optimally and -X for every step over optimal. This requires having an optimal policy for each goal to find out what the optimal number of steps is. Once again, see the getOptimalPolicy() method in the StateSpecs.
- Mutation strategy.
- As mentioned here, could use Occam’s Razor to weight rules and policies against one another. The problem is how to define such a weighting function? A further consideration to take into account (care of Tony Smith) is to factor in number of steps in PacMan. This could be part of the explicit environment award, or an internal reward on the agent’s part.
- Currently, the experiments I am running are only using populations of 100. This is probably quite bad for PacMan, which needs more than that to function. But the Blocks World environments are converging on a value (but I can’t tell if that value is optimal). In this post, I talk of dynamically controlling the parameters.
- Action voting. Especially in Ms. Pacman. This will cause the program to slow down some, but may make performance better. For instance, the majority of nearby dots may be to the right, but one to the left. PacMan may choose left by chance. But then again, who’s to say this is bad?
- Firing rules. Many rules left in the policies of the blocks world problems are redundant. This may not occur anymore if rules that misfire, or never fire are removed, but an alternative is to only update rules that do fire. That takes care of the never firing problem anyway. However, it may cause problems in the cross-entropy update process. It seems that when firing rules are used and the policies are frozen during testing, the resulting policy set is empty because all values are below 0.5.
I’m sure there’s more, but these are the current issues I know of. Hopefully I can get ’em done by before next year. In any case though, I have a proposal to write. Damn thing…
Remember to implement bullet 2 of the list. It would come in handy to see which rules are generated.