Thursday, November 16, 2006

Videos - Reinforcement Learning Control Tasks

These videos are from two reinforcement learning control problems I setup for my master's thesis work last year.


Pendulum Swing-Up
A physically-simulated pendulum is controlled by a Verve agent (http://verve-agents.sourceforge.net). Based on simple reinforcement signals (+1 when the pendulum is close to vertical, -1 otherwise), the agent learns to swing the pendulum upright and balance it after about 60 trials.






Cart-Pole/Inverted Pendulum
A physically-simulated cart is controlled by a Verve agent (http://verve-agents.sourceforge.net) in order to balance an attached pole. Based on simple reinforcement signals (-1 when the pole falls over or the cart goes off the edge of the platform, +1 otherwise), the agent learns to balance the pole for 30 minutes after about 600 trials.


Videos - Magnet Toy Simulation

These are videos of a simulation I made earlier this year. I was trying to simulate a magnet toy using OPAL/ODE. The toy has two sets of magnets, each attached to a rotating axle. Here's a picture (thanks, JO):












Videos - Artificial Evolution of Humanoid Behaviors

These videos are from a project I did in 2003. A physically simulated humanoid is controlled by an artificial neural network which senses joint angles and controls muscle forces. A genetic algorithm optimizes the neural network weights to improve performance on some motor control task.


Jumping Vertically





Standing Upright





Walking Forward


Tuesday, November 07, 2006

We are All Allotted a Constant Amount of Mental Energy

Given that we humans all have similar brain structures of comparable sizes, bandwidths, etc., we are all allotted a constant amount of mental energy which enables us to perform a certain amount of mental work per unit time. You could think of "mental work" as "planning" or "thinking." To be more precise, let's define it as reducing the uncertainty (or increasing the predictability) of a situation.

Let's look at some examples:
  1. Visual scenes: The more time you spend looking at a landscape, an interior space, or a painting, the better you are able to predict it. At first you aren't able to predict the spatial relationships between elements, but after some time it becomes easy.
  2. Logic and word puzzles: After expending a certain amount of mental energy, uncertainty is reduced, and the puzzles disappear.
  3. People: At first, strangers can be unpredictable. The more you interact with a person, the better you understand him or her. It becomes easier it is to predict his or her behavior. (Most people's personalities are moving targets, though, so it isn't possible to reach a state of completely accurate prediction.)
  4. TV shows: The more you watch a certain show, the better you are able to predict subtle interactions between characters and high-level plot elements.
  5. Sports and music: Learning new motor behaviors (like new sports, new pieces of music, and new musical instruments) requires a lot of attention. It takes effort to coordinate muscle movements into desired patterns. Over time, though, repetitive training produces smooth, synchronized motions with little effort.
  6. Cooking: The more brain-time you invest, the better you understand how ingredients interact, and the better you can predict what things will taste like.
  7. Investments: At first you have no idea how stable various markets are. Over time, your uncertainty is reduced, and you can make well-informed decisions based on past experience.
In each situation it takes a certain amount of mental work to reduce uncertainty, just like it takes a certain amount of physical work to move heavy objects. You could say that complex situations are "heavier" than others.

Even though most of our brain structures function in parallel, the whole attention system is a serial mechanism. Think about it. You can only focus on one discrete thought at a time. Try looking at a complex scene or object. You are able to look at the whole thing at once, but you can't attend to more than one component at a time. Right now I'm looking at a house plant in my living room, and I can't simultaneously think "plant" and "leaf." I have to focus on either a small part or the whole thing. It's complicated because you can think of similar sets of objects at a time ("all the books on my bookshelf"), but you're still limited to a single discrete thought.

Imagine your attention system constantly switching among various thoughts. It spends some time working on a crossword puzzle, switches for a half second to think about what to have for lunch, switches back to the puzzle, switches to the sound coming from the radio for a bit, switches back to the puzzle... Every time attention shifts, it starts applying mental force to a new mental object, moving it a little bit across the spectrum from unpredictable to predictable.

The point here is that we all have a similar (within an order of magnitude) amount of mental energy available to us per unit time. Since birth we have all expended a similar amount of this energy on something. So everyone must have some kind of hidden talent related to those things that occupy most of his or her thoughts.

(The whole idea of "useful" mental work is another story. Utility can be defined in a variety of ways. A common one might be something along the lines of "the good of the many." If a person's goals are aligned with the utilitarian viewpoint, he or she would spend his or her mental energy on problems that benefit the most people. If the reward hypothesis is correct, we spend our mental energy on those things that we expect [based on previous experience] will bring us the most rewards. This gets into the whole area of motivation, which is beyond the scope of this article. To attain the highest level of utility, however it is defined, it is probably necessary to spend time thinking of ways to improve one's own thinking abilities, or metalearning.)

Now Listening to...

...audio lectures from Jeremy Wolfe's course, Intro to Psychology, Fall 2004, on MIT OpenCourseWare. Even though it's an intro course, I think it'll be good. Jeremy sounds like a great teacher (you can just tell after listening to him for 30 seconds). And I think it's good to hear the fundamentals lots of times from a variety of teachers.

I didn't quite finish to Gerald Schneider's Animal Behavior course (I finished 22 out of 37 lectures). I heard all I wanted to hear and decided to move on.

Update on Verve Development

For a while my posts have been focused on things other than the Verve project itself. One reason is that I enjoy posting about random interesting ideas that pop into my head. The other reason is that I've been thinking of starting a new software development effort. It would have the same general goals as Verve, but it would use much more advanced methods. For instance, I have been doing a lot of research into the general topic of context representation. I have prototyped a few subcomponents and have drawn lots of plans. Rather than ripping out the existing context representation in Verve (i.e. its dynamically-growing radial basis function system) and reworking all the components that connect to it, I'll probably just leave it how it is and continue building this new system.

Things are going very well overall. I'm having a blast. I think it's important to have a blast when you're doing research. It's good for morale. And if your morale is suffering, you're not going to do good research.

I'll post more details as things progress.