Well, I’ve been up and down with this whole cross-entropy thing. First it was optimism, then negativism, then optimism again.
As of now, the code is running a blocks world run where all the predicates should be working. Previously the above and highest predicates weren’t implemented, resulting in bare minimum policies of putting a onto b. Also, the rules are not being regenerated/mutated so the initial rulebase is the main factor in performance. Reintegration is active though, so there is a higher chance of sharing useful rules.
Regeneration and mutation cannot yet be implemented due to the predicates not being counted. There are also many other areas of work TODO, but they are marked and will be tackled eventually.
Anyway, because things are looking up again, I think this could possibly be a paper. Cross entropy hasn’t been applied to the relational domain that I know of, so it could be a novel paper. The things to reference in this paper include the FOXCS paper, the original Szita Lorincz CE paper, the Evolving and Transferring Probabilistic Policies for Relational Reinforcement Learning by Martijn Van Otterlo and Tim De Vuyst, and other less important papers, such as Mandarax and Blocks World def.
The paper will outline the original cross entropy algorithm and the heavily modified CE algorithm for the relational domain. The PacMan example may just be a toy example, showing that it can be compared to Szita and Lorincz original paper. The Blocks World example will allow it to be compared to FOXCS and any other RRL algorithms, really.
GRAGH! No wonder it was going so fast. The arguments file was only running it with a population of 100. So it’s likely to take 50 minutes rather than 5. Although even that has been increased now, because the inferring process with above and highest slow the algorithm somewhat. Perhaps the JPredicates slow things down. Most probably, as each method has to be run and unified. I will have to check this.