RL: Baby steps – Sam Sarjant

Lately I have been wondering what my end-goal has been for my PhD. Or even the end-goal for my life career. And I have come up with a vision of a robot utilising my algorithm to learn, basically, what it is to be human (and beyond!).

Much like a newborn baby learns, I hope to train a robot up. Of course, being a robot, it should be able to perform computations faster (although the brain is pretty powerful). And I hope to do this by using RL techniques.

Problem is, reinforcement learning relies on one major thing: a scalar reward. But when interfacing with a human, there are no numerical rewards unless the person explicitly enters them in.

So, the learner must first create a way of converting positive observations (a smile, or “good job!” for example) into scalar rewards. In order to do this though, it needs to learn what observations are good in the first place. So in the natural language learning domain, it would have to understand natural language in the first place (an exceptional feat itself!) in order to convert positive remarks into rewards.

These first ‘building-blocks’ learning scenarios won’t be easy, without manual scalar reward input or training examples. So, like a baby does (I guess, anyway. Not citing any proven studies), the robot will first attempt to imitate actions in order to learn them. It would not be completely blank at the beginning – it would know what it’s ‘arms’ are and how to move them, so it would be able to imitate behaviour.

This imitating will lay down the first building blocks. Unfortunately, I’m not totally aware of how this comes about (once again, how is reward computed?), but it’s an idea to keep in mind.