The Verve Project: February 2006

So, assuming the brain is trying to maximize long-term reward intake... what is a "reward?" I think it can be arbitrarily defined, of course, but biological brains have a set of hard-wired primary reward/reinforcement signals. (I am using "reward" and "reinforcement" interchangeably here, with the assumption that both can represent positive and negative values.) The most obvious ones are: food, sex, and pain. Incoming sensory information that is classified into these categories generates reward signals.

A more interesting hypothetical reward signal comes from novelty. For now, let's call it "novelty rewards." We are drawn to novel situations because of novelty rewards. There is evidence that the same dopamine neurons that fire during unexpected "normal" rewards (e.g., food) also fire in all kinds of novel situations. The book Satisfaction: The Science of Finding True Fulfillment, by Gregory Berns, talks extensively about novelty being a major source of rewards in humans.

But I think there's more to it than simple novelty. If we were rewarded simply by experiencing novel situations, we would be drawn to all kinds of random signals (e.g., static radio signals) like moths to a street lamp. As Juergen Schmidhuber proposed in the early 1990s, a better model of novelty-based rewards is that it is rewarding to experience the reduction in uncertainty over time (see Juergen's papers on curiosity here: http://www.idsia.ch/~juergen/interest.html). Basically, it is rewarding to learn. I'll call this "learning rewards." This whole idea is based on our predictions about the world. The term "novelty," then, means the same thing as "uncertainty" and "unpredictability." Fortunately for computational modelers, uncertainty can easily be quantified using predictive models.

If an agent remains in a situation where predictable information can be gained over time, we can say the following:

Uncertainty/novelty/unpredictability is reduced
Learning occurs (specifically, the predictive model gets better)
"Learning rewards" should be given to the agent

These learning rewards are extremely powerful. They enable very open-ended learning where agents are drawn to situations that contain learnable, predictable information. It drives them to experiment at the edge of their current knowledge of the world. Since it drives agents to improve their predictive models, it is very beneficial for planning.

Subjectively, it feels rewarding to gain information. When you are working on a crossword puzzle, and suddenly the "aha!" moment hits, it's the resulting transition (from uncertainty to certainty) that feels good.

The Verve Project

Monday, February 27, 2006

What is Rewarding?

The Reward Hypothesis