PhD: Initial ideas

My meeting with a potential supervisor is coming up, so my mind has returned back to reinforcement learning musings and I feel as though I should note down any ideas that come to me.

So, the learning process is still very vague, but there are some ideas that I have about it.
– Assumptions. As the agent learns and explores the domain, it can make assumptions, which can be used for theories of sorts. To stop it from making an ass of itself, it will need to test these assumptions somehow to verify their validity.
– Cyc. My work in Cyc so far has illustrated how useful the ontolgy can be. Perhaps all of the agent’s knowledge could be stored within Cyc under a particular microtheory. This would eliminate much of the work of implementing a logically-based structure as well as giving a large repository of background knowledge.
Hopefully if my Summer Research goes well, it will contain a large amount of information.
Cyc also has it’s existing inference engine, however, if the agent is to discover information by itself, then the facts it enters into Cyc may not be 100% accurate, so any inferences drawn from it’s learned information would lose accuracy as well. I’m not sure if Cyc supports non-100% facts, but it may have to be something that I would have to deal with.
– Episodic or Continuous? When the agent is learning, things are easier if the environment is of an episodic nature. However in the real world, things aren’t always like that. Because things are continuous, things are always changing. To stop an agent from settling on a single strategy, it should continue learning in the background (from scratch) while still following a learned strategy. Once the learning has reached a sufficient point, the current strategy and the newly learned strategy could be combined (or swapped) to create a more relevant strategy. This keeps the agent dynamic.
This method is but one strategy for tackling the problem. Incremental learning could also be used (perhaps even in tandem).
– Speaking of tandem learning, the agent can also be compiling relevant statistics about the environment for use in machine learning and finding a model to fit. The gathering of relevant statistics here will be the main problem. Damn feature extraction! Where this learned model can be used is still ambiguous, but from the beginning, I have always intended to combine multiple forms of learning to help the agent in any way possible.