The Verve Project: January 2005

Things are changing.

I have spent the past two months reading literature on two topics:

1) Reinforcement learning (for machines)
2) Reward signals in biological learning systems

A few months ago I started exploring things other than evolutionary methods. I couldn't help thinking that one of the best ways to improve machine learning methods is by studying and mimicking biological brains. At the same time I dove deeper into the (machine) reinforcement learning literature, trying to find out which methods are the best for motor control.

I decided to read the book Reinforcement Learning by Sutton & Barto over Christmas break, something I probably should have read a long time ago. Before reading it, my plan was to try out an actor-critic architecture, along with some kind of "sensory predictor" (a module that could learn to predict future sensory inputs based on some given state and action). The sensory predictor would allow an agent to simulate its environment and make decisions based off those simulations. I was proud of this idea at the time since I had thought of it one day in the middle of class, but later found out it already existed. Also, I had heard something about temporal difference learning methods (from the neuroscientific literature - dopamine neuron behavior is very similar to the temporal difference error signal), but I didn't know much about them. So I read Reinforcement Learning straight through. It turns out that temporal difference learning is a very effective way to predict future rewards (in the critic) AND to modify the actor's policy to reinforce effective actions (using "eligibility traces"). This works even when the reinforcement received from the environment is sparse. Temporal difference methods learn to associate neutral stimuli with rewards, thus creating chains of reward-predicting stimuli. Very cool stuff. And, the more I read in the neuroscientific literature about reward signals in the brain, the more confident I become that biological brains use temporal difference learning.

Additionally, the Reinforcement Learning book made it clear to me why neural networks are important. In the real world there are so many possible states that an agent cannot simply store an internal table of each state or state-action pair, along with reward estimates for each. A neural network condenses the state space by using a small set of parameters to approximate a more complex function.

At this point I have made a fairly permanent switch from evolutionary methods to reinforcement learning methods. I have written a good chunk of code so far to begin testing new ideas, including the basic neural network setup, temporal difference learning, and back-propagation. The back-prop will probably be used by the "sensory predictor" (i.e. internal/world model) to learn an internal representation of the external environment. Back-prop seems to be good for this because there will be predicted sensory inputs and actual sensory inputs, naturally leading to error signals at each output neuron.

I have run a few simple tests to make sure each component is functioning properly. The thing I'm testing now is a simple classical conditioning response. I'm trying to get my new agents to be able to predict rewards given some stimulus 200 ms before reward reception.

In the future I would like to setup a SETI-like system for training simulated creatures and/or robots. I think it would be a good way to get people involved in AI research, not to mention the extra CPU resources.

Right now I'm preparing a Verve presentation for the ISU Robotics Club tomorrow which will basically cover everything in this post, plus a little more background information. I'm hoping to use Verve to train real robots in the future (as opposed to simulated creatures). Maybe some folks from the club will be interested in working on something like that.

Tyler

The Verve Project

Wednesday, January 19, 2005

Research update