The second problem I’m currently having is the fact that even with faster learning, Ms. PacMan still performs badly. For example, the ‘optimal’ policy I designed plays quite badly at times – usually when ghosts are close.
This could be attributed to bad action balancing when evaluating the action decisions the agent returns. For instance, the only reason why Ms. PacMan doesn’t charge through a ghost to get to the dots behind it is simply because I fixed the distance weighting to be squared. Even so, Ms. PacMan still gets stuck at local minimums when choosing between two directions (usually until a ghost wanders too close). And I also had to fix the weighting between similar directions (dots either side).
I may have a new weighting idea which removes the need for this squaring. When summing the weighted directions, add an extra coefficient to the calculations equal to 1/the number fo actions for the action list. For instance, when going toDots, there are over 100 of them, but when running fromGhosts, there are only 4. If using a naive system, Ms. PacMan would charge through a ghost if there was 4 dots behind it, simply because a dot’s weight is equal to a ghost’s.
To fix this, the weights of actions can be normalised based on the number of returned actions. So when running fromGhosts (assuming all four are within range and aggressive), the coefficient would be equal to 0.25. Whereas the coefficient for chasing dots would be something like 0.005 (at the start, anyway).
There may be problems with this system in the endgame, when there is only 1 dot left (coefficient of 1). It is hoped that the weighting based on the ordering of the actions will overcome this (high weight for first action list, low for last).