In order to make things easier on me making the shift to relational, and because it seems unnecessary, I am removing the option to switch actions off in the PacMan domain. Basically, PacMan will only be allowed to switch actions on. This may result in slightly different policies, but will likely result in an increase in performance due to half of the action space being cut away. As for using actions as observations, I could probably just use a predicate that takes an action as an argument (perhaps actionOn(Action)). I’ll deal with this when I get to it.
Furthermore, I am modifying the observations in a similar manner. Now, the comparison between an observation and a constant will always be ‘<='. This is party because of the above change (both actions and observations inherit from an overall condition that uses boolean operators), and partly because 'greater than' rules are rarely seen. Although, in doing this, I need to introduce the '99' constant in all observations, such that there will be the possibility of a rule that simply looks for something that exists (Edible ghosts, for instance). If there is an edible ghost within 99 units (a certainty when there are edible ghosts), this rule will fire. Most of the 'greater-than' rules appear to be associated with the hand-coded rules that turn actions off. This results in relations that aren't too difficult to code: nearestDot(X,5), meaning get all dots within 5 units or less. A further problem with this relational learning is that the result sets can contain multiple actions (the above example may get up to 20 possible dots to move towards). If PacMan is choosing an action every step, it may be given a choice of all sorts of actions, especially if the policy is to choose from a sum of actions (stochastic) for each priority level. This could be tested against a deterministic policy to see which strategy works best (which I am currently using).