As I implement the algorithm, I came across some more thoughts and details of the algorithm.
– As the cooling rate effects the eTemperature, the eTemp could effect the minimum cooling rate. So the minimum cooling rate could be defined as 1 – eTemp. Therefore, an eTemp of 0.2 means the minimum cooling rate value is 0.8. This will guarantee the cooling rate is adjusted back to 0.99 once the eTemp has settled.
– As for the maximum cooling rate, I think the max rate can be defined as 1 + (1 – eTemp^2). Therefore, an eTemp of 0.8 results in max rate of 1.36.
– The threshold’s original modification formula seems to move too fast (by thought process). Therefore, I’ll change the / 10 to / 100.
After 1 experiment, both parameters appear to change far too quickly for my liking, so I have decreased the rate at which they change tenfold each.
Also, I intend to change the zero field evaluations so that every parameter is linear and increases in height won’t result in huge field differences.
Another note. The agent is too rash early on and is quick to settle on a successful strategy. It seems a little too quick for my liking though and I think changing it so the agent doesn’t adpat until the policy is at least half full is a wise idea. Or perhaps, no changes until 200 pieces have been placed (roughly equivalent as 20 * 10 = 200).
Hmm. Judging from what I’ve seen, this experiment was a failure. Back to version 1.6.