PhD Progress: Problem Domains and Meta-Goal Modules

As I was reviewing a paper for NZCSRSC, I notcied how well the problem definition was set out. I cannot recall if any other problem definitions in RRL are set out quite as well, though there is Martijn’s fingerprint recognition example. So this got me thinking about possible domains for the agent to learn in.

One of which was (not so) simply driving from A to B. There are a great deal of actions to perform in driving, and multiple goals and conditions must be satisfied for the agent to perform well. These goals and conditions mirror the previous musings on general modules achieving them.

Anyway, perhaps my work needs to focus on this module/meta-goal achievement direction – breaking a problem into a number of possibly prioritised smaller simultaneous goals. For instance, in Pac-Man, the goal is to get a high score. But to do this, Pac-Man must remain alive. The agent needs to discover how to break a problem down into these areas.

StarCraft is a much bigger, and more important example. The agent needs to have that overall goal in mind, as well as the low-level goals and perhaps planned ramifications of achieving those goals.

2 Replies to “PhD Progress: Problem Domains and Meta-Goal Modules”

  1. Hi! I’m an undergrad, and I’m planning to work in the field of controls.

    So I’m not exactly sure I’ve read enough posts to really tell what you’re working on, nor do I think I have the background to understand what you’re working on. But this post reminds me a bit of receding horizon control, where instead of meeting one large goal, you form smaller goals and meet smaller goals one at a time.

    1. Hmmm. That would be an interesting option to explore, and in the field of Reinforcement Learning there are similar measures. Though this would only work in an environment where intermediate rewards are given, rather than a single reward given at the end. But in such environments, small and quick policies could be tested on a small number of steps and gradually improved by extending the length.

Comments are closed.