Thursday, December 14, 2006

Testing a New Context Representation on Visual Images

This semester I've been working hard to design a totally new type of "context representation." The purpose of this system is to represent sensory information in a form that is compact, retains as much information as possible, and is useful for behavior. What I have so far is a hierarchical inference system that can do nonlinear dimensionality reduction on the input data while maintaining topographic relationships at each level.

These two pictures show a single layer during and after training on a set of images of contrasting edges. Notice the smooth topographic representation.


The next one shows the result of training on a set of line images.


The next picture shows a single layer in training above two other images below. The bottom left is the actual input image, and the bottom right is a reconstruction of the input image (i.e. after compression and decompression). "Reconstruction" here is performed with a generative model. Note that the compression level here is easily adjustable, which gives a clear speed/accuracy trade off.


The final picture shows a hierarchical setup. Again, the bottom images show the actual input (left) and reconstruction (right). In the middle is a layer of four processing elements with limited receptive fields (each looks at one fourth of the input image). On top is a layer connected to the layer below; its receptive field covers the entire input image. Again, notice the smooth topographic representation in the middle layer, even between the four processing elements.


I'm still in the middle of running lots of experiments to verify that everything is working properly. I really want to make sure I'm getting correct results (the reconstructed images should match the inputs well) with efficient runtime performance (which should be good with a hierarchical architecture). I wouldn't feel too bad spending another semester testing this thing since it's one of the most critical components of an intelligent system. Lots of other components are context-dependent, meaning that a good context representation will enable other parts to do their jobs better.

New Tyler & Shana Blog

Shana and I started a new blog for things that relate to both of us. We started it off with pictures of our new house. Check 'em out.

Wednesday, December 13, 2006

Microsoft Robotics Studio 1.0

Microsoft Robotics Studio 1.0 was just released. This letter describes what it's all about, as does this video.

It's pretty exciting to have a standard development environment for robotics applications. If it's robust enough, I might end up using it for research. In the past I have used OPAL/ODE for physics and OGRE for visualization, which is great because I have full control over the entire simulation, but it requires a lot of overhead to setup each new control task. Using Microsoft Robotics Studio would save me tons of time and give me lots of new features (e.g., debugging sensors via a web browser).

Thursday, December 07, 2006

Mario Paint Caricature Video, 1993 CES

At the Consumer Electronics Show in Chicago, 1993, Nintendo had artists at their booth showing off the new SNES game Mario Paint by doing live caricatures of people. This is the caricature video of my brother (Tony, on the right), and me (Tyler, on the left).



Thursday, November 16, 2006

Videos - Reinforcement Learning Control Tasks

These videos are from two reinforcement learning control problems I setup for my master's thesis work last year.


Pendulum Swing-Up
A physically-simulated pendulum is controlled by a Verve agent (http://verve-agents.sourceforge.net). Based on simple reinforcement signals (+1 when the pendulum is close to vertical, -1 otherwise), the agent learns to swing the pendulum upright and balance it after about 60 trials.






Cart-Pole/Inverted Pendulum
A physically-simulated cart is controlled by a Verve agent (http://verve-agents.sourceforge.net) in order to balance an attached pole. Based on simple reinforcement signals (-1 when the pole falls over or the cart goes off the edge of the platform, +1 otherwise), the agent learns to balance the pole for 30 minutes after about 600 trials.


Videos - Magnet Toy Simulation

These are videos of a simulation I made earlier this year. I was trying to simulate a magnet toy using OPAL/ODE. The toy has two sets of magnets, each attached to a rotating axle. Here's a picture (thanks, JO):












Videos - Artificial Evolution of Humanoid Behaviors

These videos are from a project I did in 2003. A physically simulated humanoid is controlled by an artificial neural network which senses joint angles and controls muscle forces. A genetic algorithm optimizes the neural network weights to improve performance on some motor control task.


Jumping Vertically





Standing Upright





Walking Forward


Tuesday, November 07, 2006

We are All Allotted a Constant Amount of Mental Energy

Given that we humans all have similar brain structures of comparable sizes, bandwidths, etc., we are all allotted a constant amount of mental energy which enables us to perform a certain amount of mental work per unit time. You could think of "mental work" as "planning" or "thinking." To be more precise, let's define it as reducing the uncertainty (or increasing the predictability) of a situation.

Let's look at some examples:
  1. Visual scenes: The more time you spend looking at a landscape, an interior space, or a painting, the better you are able to predict it. At first you aren't able to predict the spatial relationships between elements, but after some time it becomes easy.
  2. Logic and word puzzles: After expending a certain amount of mental energy, uncertainty is reduced, and the puzzles disappear.
  3. People: At first, strangers can be unpredictable. The more you interact with a person, the better you understand him or her. It becomes easier it is to predict his or her behavior. (Most people's personalities are moving targets, though, so it isn't possible to reach a state of completely accurate prediction.)
  4. TV shows: The more you watch a certain show, the better you are able to predict subtle interactions between characters and high-level plot elements.
  5. Sports and music: Learning new motor behaviors (like new sports, new pieces of music, and new musical instruments) requires a lot of attention. It takes effort to coordinate muscle movements into desired patterns. Over time, though, repetitive training produces smooth, synchronized motions with little effort.
  6. Cooking: The more brain-time you invest, the better you understand how ingredients interact, and the better you can predict what things will taste like.
  7. Investments: At first you have no idea how stable various markets are. Over time, your uncertainty is reduced, and you can make well-informed decisions based on past experience.
In each situation it takes a certain amount of mental work to reduce uncertainty, just like it takes a certain amount of physical work to move heavy objects. You could say that complex situations are "heavier" than others.

Even though most of our brain structures function in parallel, the whole attention system is a serial mechanism. Think about it. You can only focus on one discrete thought at a time. Try looking at a complex scene or object. You are able to look at the whole thing at once, but you can't attend to more than one component at a time. Right now I'm looking at a house plant in my living room, and I can't simultaneously think "plant" and "leaf." I have to focus on either a small part or the whole thing. It's complicated because you can think of similar sets of objects at a time ("all the books on my bookshelf"), but you're still limited to a single discrete thought.

Imagine your attention system constantly switching among various thoughts. It spends some time working on a crossword puzzle, switches for a half second to think about what to have for lunch, switches back to the puzzle, switches to the sound coming from the radio for a bit, switches back to the puzzle... Every time attention shifts, it starts applying mental force to a new mental object, moving it a little bit across the spectrum from unpredictable to predictable.

The point here is that we all have a similar (within an order of magnitude) amount of mental energy available to us per unit time. Since birth we have all expended a similar amount of this energy on something. So everyone must have some kind of hidden talent related to those things that occupy most of his or her thoughts.

(The whole idea of "useful" mental work is another story. Utility can be defined in a variety of ways. A common one might be something along the lines of "the good of the many." If a person's goals are aligned with the utilitarian viewpoint, he or she would spend his or her mental energy on problems that benefit the most people. If the reward hypothesis is correct, we spend our mental energy on those things that we expect [based on previous experience] will bring us the most rewards. This gets into the whole area of motivation, which is beyond the scope of this article. To attain the highest level of utility, however it is defined, it is probably necessary to spend time thinking of ways to improve one's own thinking abilities, or metalearning.)

Now Listening to...

...audio lectures from Jeremy Wolfe's course, Intro to Psychology, Fall 2004, on MIT OpenCourseWare. Even though it's an intro course, I think it'll be good. Jeremy sounds like a great teacher (you can just tell after listening to him for 30 seconds). And I think it's good to hear the fundamentals lots of times from a variety of teachers.

I didn't quite finish to Gerald Schneider's Animal Behavior course (I finished 22 out of 37 lectures). I heard all I wanted to hear and decided to move on.

Update on Verve Development

For a while my posts have been focused on things other than the Verve project itself. One reason is that I enjoy posting about random interesting ideas that pop into my head. The other reason is that I've been thinking of starting a new software development effort. It would have the same general goals as Verve, but it would use much more advanced methods. For instance, I have been doing a lot of research into the general topic of context representation. I have prototyped a few subcomponents and have drawn lots of plans. Rather than ripping out the existing context representation in Verve (i.e. its dynamically-growing radial basis function system) and reworking all the components that connect to it, I'll probably just leave it how it is and continue building this new system.

Things are going very well overall. I'm having a blast. I think it's important to have a blast when you're doing research. It's good for morale. And if your morale is suffering, you're not going to do good research.

I'll post more details as things progress.

Thursday, September 21, 2006

Intuitiveness

What does it mean for something to be "intuitive?"

It must have something to do with our intuition. Ok... so what's "intuition?" I like to define it as previous knowledge (loosely defined), either instinctive or gained through learning. Thus, something that is intuitive is something that takes advantage of previous knowledge. A key point here is that intuitiveness is subjective.

If you have played poker games in the past, the rules of a new poker game will be intuitive if they rely on knowledge gained from other poker games. If you have used Microsoft products in the past, new Microsoft products will be intuitive as long as they are designed like the old ones. If you have driven a car in the past, driving a new car should be easy. (Driving may not be very intuitive at first since we don't otherwise press levers to change velocity and turn a wheel to change directions... except in video games.) In all of these cases standards are important since they ensure that previous knowledge is exploited.

This thought process started a year and a half ago in a class assignment for "Interaction Methods for Emerging Technologies." The assignment was to explain why direct manipulation devices are usually preferred by users. (Direct manipulation devices are those in which the user's actions directly affect the end object, as opposed to devices that add one or more levels of indirect manipulation. For example, a computer mouse has one level of indirect manipulation since it indirectly controls the pointer on the screen.) This was my answer:
Why are direct-manipulation interfaces preferred?

I will use the phrase "knowledge transfer" to refer to the amount of previous knowledge that can be applied to a new domain. Direct-manipulation exploits a lot of knowledge transfer because user's manipulate the device in a similar way to how they manipulate everyday objects. Direct-manipulation usually requires little learning, thus less effort and/or frustration when using a new device.

Additionally, I would say that intuitiveness in any domain is directly proportional to the amount of knowledge transfer being used, maybe going so far as to say intuitiveness is equivalent to the utilization of knowledge transfer. So things can be intuitive to some people and not others depending on their experience. Direct-manipulation is more intuitive to almost everyone because almost everyone has had a lot of experience manipulating everyday objects. A new Microsoft product, on the other hand, would be intuitive for people experienced with Microsoft products because of the knowledge transfer involved, but not for others who aren't used to them (hence the effectiveness of interface standardization).

Monday, February 28, 2005

Monday, September 18, 2006

ASME IDETC & CIE Conference 2006


I presented a paper on Verve at the 2006 ASME (American Society of Mechanical Engineering) IDETC & CIE conference (International Design Engineering Technical Conferences and Computers and Information in Engineering... whew). The official reference is the following:

Streeter, T., Oliver, J., & Sannier, A. 2006. A General Purpose Open Source Reinforcement Learning Toolkit. In Proceedings of the ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference.
The presentation went pretty well. I was put in a strange topic area, though (Knowledge Management in Design Automation). I think the audience liked the live demos, and I had some good questions from them afterwards. The paper and presentation slides are available in the publications section of my website.

I attended a talk called "The Spirituality of Engineering" by two professors from Delft University in the Netherlands. They showed a video of a grad student from Russia doing a 6 month research internship in Paris (developing a control algorithm for the inverted pendulum problem). Then the speakers and the audience talked about various types of symbolism present in the film.

I also went to two industry/government panel discussions: "Challenges Confronting Mechanical Systems with Emphasis on Intelligence," and "Industry & Government Perspective on Issues and Challenges for Robotics." James Albus gave a presentation at the first one called "Building Brains for Thinking Machines," which was a brief summary of his work on hierarchical control systems. I talked with Dr. Albus briefly before his talk about some of his books (Intelligent Systems and Engineering of Mind).

Saturday, September 02, 2006

Now Listening to...

...audio lectures from another Gerald Schneider course, "Animal Behavior," on MIT OpenCourseWare.

I'm really enjoying taking courses on my own schedule. I just fit an entire course (Neuroscience and Behavior) into two weeks. (Actually I only listened to 16 out of the 30 audio lectures available, but they covered the topics I cared about.) It's great because I usually get bored with normal courses by the end of the semester. I have a pretty good feel for when I'm in a learning mood, and it usually doesn't coincide with scheduled lecture times. Now I can just keep my iPod with me all the time and listen to a chunk of a lecture while I'm riding the bus, exercising, or waiting for software to compile. I don't think I could go back to the old way. I'm done with old school school.

Thursday, August 24, 2006

Are Dreams Undirected Planning Sequences?

I had a thought. Say we have mental models of the world available for planning, and we can direct our planning sequences (i.e. trains of thought) through imagined spaces in order to achieve some goal. That's in the awake state. During dreaming, there is reduced activity in the prefrontal cortex, an area that is usually associated with planning/judgment. Maybe when we dream, our minds move through the same imagined spaces, but they do so in an undirected manner because the high-level planning control center is turned off. It's like the car is still moving, but no one's in the driver's seat. Without the prefrontal cortex, we can't really judge the absurdity of different situations, which explains why dreams always seem normal until we wake up.

Currently Listening To...

...audio lectures from Gerald Schneider's course, "Neuroscience and Behavior," on MIT OpenCourseWare.

Main Areas of Investigation

The main areas of the Verve agent design that need work are:

  • Context representation - should be more computationally efficient, and it should include a short-term memory of previous observations
  • Hierarchical action construction - complex actions should be built up automatically from low-level primitives
  • Experimentation with curiosity - this will be much more interesting once the previous two areas have been developed

Tuesday, August 22, 2006

I'm back from New York

I just got back from my internship in New York on Sunday night. It was a lot of fun. I had some great discussions with the other researchers in my group. My project was to help develop and test a computational model of the cerebellum. I learned a lot about the cerebellum (of course) and was introduced to several information theory-based approaches (like Linsker's Infomax framework and the subsequent ICA work of Bell & Sejnowski).

Now that I'm home, I'll be able to work on my own research again. I'd like to start looking at better context representations soon, possibly using ICA.

Sunday, August 06, 2006

The Classical and Romatic Qualities of Jazz Improvisation

I saw Arturo Sandoval play at the Blue Note on Friday. It was great, of course. I first heard him play in high school when I bought his album Hot House.

As I was listening, I started thinking about what goes on in your head when you improvise jazz. I think it helps to separate things into two categories: playing well technically (high notes, fast difficult patterns, etc.), and playing well emotionally (having a good gut feeling for what sounds good when). In other words, knowing how to play and knowing what to play. These two components trade control back and forth throughout a solo; one part chooses which pattern to play next, the other part executes it, the first part chooses another pattern, the other part executes it...

Some improvisers are good at one but not the other. For example, some are very proficient at playing difficult material, like high notes and fast, intricate syncopated sequences. But they don't have any clear direction to their solos (i.e. they're not good at choosing patterns). Others lack technical skill (i.e. they haven't mastered a lot of patterns), but they choose sounds in a way that produces an interesting overall message. They play with soul. I like to think of these two types of skill as Classical and Romantic Quality, as defined by Robert Pirsig in Zen and the Art of Motorcycle Maintenance. (I also think these two categories can be mapped roughly onto the brain's cognitive/analytical processes and its emotional/limbic processes, respectively.)

The best imrovisers are good at both kinds of Quality. Their brains have large libraries filled with high-quality patterns (Classical), and they have amazing selection mechanisms (Romantic). Their solos have relevant messages, and they're constructed out of masterful building blocks.

I haven't played jazz for a little while (about a year and a half), but I'd like to play again. I stopped playing because grad school got busy, mainly, but I also became frustrated with how I was playing. One thing was that I didn't like how I sounded in recordings. Something just wasn't right. Kind of like how most people don't like to hear their own voice. Another thing was that I wasn't really advancing as an improviser. I think it was because I was too focused on achieving Classical Quality. I would always practice with the goal of mastering some new pattern in all twelve keys, for example. I was never able to focus on learning to fashion the overall structure of a solo into anything meaningful. Then when it came time to play a solo with a group, I would go on autopilot, randomly jumping from one well-learned pattern to the next. My solos lacked direction.

I think what led to this problem was how, in most school bands I played in, the kids that could play the highest/fastest solos got the most respect (usually from their peers, not necessarily from the directors). So people that wanted to be noticed (myself included) would practice playing high notes and flashy passages. The Classical Quality of musical performance alone was usually equated with proficiency. Of course, it's hard to practice (and harder to teach) the Romantic Quality of jazz improvisation, especially in isolation. It's best to get real experience with a group that's not totally focused on the attention-getting parts of soloing.

Pirsig's division of idealogies into Classical and Romantic appears to be a good one. It appears to encompass a wide range to situations, including jazz improvisation. (I think that this division can be tied somewhat directly to the structure and function of our brains. I'm interested to find out if this is true.) Some improvisers choose to focus on developing one are more than the other (sometimes to the point of producing totally unbalanced solos), but the best soloists are good at both.

Tuesday, August 01, 2006

Curiosity Rewards Related to Flow

First, here's a diagram I made for a presentation on Artificial Curiosity at the HCI Forum 2006. It shows the simple idea behind the powerful Schmidhuber type of curiosity mechanism:


I had read about Mihaly Csikszentmihalyi's idea of "flow" in the book Satisfaction: The Science of Finding True Fulfillment, by Gregory Berns (which totally sounds like a hokey self-help book, but it's not). According to wikipedia, "The flow state is an optimal state of intrinsic motivation, where the person is fully immersed in what he or she is doing." Yesterday I came across Jenova Chen's game flOw, which is supposed to be based on the same principle. He has this great diagram in his thesis that explains flow quite simply:


Notice any similarity between this and Schmidhuber's curiosity model? Seems like the same type of mechanism; the optimal situation is somewhere between overly challenging and too simple. Intrinsic rewards for situations with maximum learning progress. Or, in the case of flow, intrinsic rewards for appropriate challenges. This seems like a major isomorphism.

Monday, July 31, 2006

Simulacra and Simulation Related to Online Virtual Worlds

From Simulacra and Simulation, by Jean Baudrillard, 1981, page 1:
"If once we were able to view the Borges fable in which the cartographers of the Empire draw up a map so detailed that it ends up covering the territory exactly (the decline of the Empire witnesses the fraying of this map, little by little, and its fall into ruins, though some shreds are still discernible in the deserts - the metaphysical beauty of this ruined abstraction testifying to a pride equal to the Empire and rotting like a carcass, returning to the substance of the soil, a bit as the double ends by being confused with the real through aging) - as the most beautiful allegory of simulation..."
This particular idea of simulation, a copy of some original thing which might acquire more attention than the original itself, makes me think of what Google Earth might turn into.
"...this fable has now come full circle for us, and possesses nothing but the discrete charm of second-order simulacra.

Today abstraction is no longer that of the map, the double, the mirror, or the concept. Simulation is no longer that of a territory, a referential being, or a substance. It is the generation by models of a real without origin or reality: a hyperreal. The territory no longer precedes the map, nor does it survive it. It is nevertheless the map that precedes the territory - precession of simulacra - that engenders the territory, and if one must return to the fable, today it is the territory whose shreds slowly rot across the extent of the map. It is the real, and not the map, whose vestiges persist here and there in the deserts that are no longer those of the Empire, but ours. The desert of the real itself."
The simulacrum concept (a copy with no original), however, appears to be more similar to our fantasy virtual worlds: Second Life, World of Warcraft, There, Project Entropia.

The major distinction between these two types of online worlds (that one is based on our physical world and the other is totally fictional) might become more pronounced in the future as these places gain in popularity. Each type has its benefits: one is instantly familiar, the other allows more creative freedom. Both seem to have their own place in the foreseeable future. If people someday spend so much time in virtual worlds that the real one is no longer familiar, the Earth-simulation type might fade away.

Thursday, July 13, 2006

Ethical Issues in Advanced Artificial Intelligence

Why should we study intelligence (either artificial or biological) and develop intelligent machines? What is the benefit of doing so? About a year ago I decided my answer to this question:

To the extent that scientific knowledge is intrinsically interesting to us, the knowledge of how our brains work is probably the most interesting topic to study. To the extent that technology is useful to us, intelligent machines are probably the most useful technology to develop.

I have been meaning to write up a more detailed justification of this answer, but now I don't have to, because I just read this great paper... "Ethical Issues in Advanced Artificial Intelligence," by Nick Bostrom. I think this paper provides a clear explanation for developing intelligent machines. It touches on all the important issues, in my opinion.

A few points really resonated with me. Here are some succinct excerpts:
  1. "Superintelligence may be the last invention humans ever need to make."
  2. "Artificial intellects need not have humanlike motives."
  3. "If a superintelligence starts out with a friendly top goal, however, then it can be relied on to stay friendly, or at least not to deliberately rid itself of its friendliness."
The third point, I believe, adequately deals with the issue of intelligent machines going crazy and taking over the world. It's all about motivation. If a machine has no intrinsic motivation to harm anybody, it will not do so. There are some caveats to this, some of which were discussed in the paper. I don't think any of these are insurmountable. First, random exploration during development in an unpredictable world will inevitably cause damage to someone/something. I don't think this is a huge problem, as long as the machine is sufficiently contained during development. Second, a machine with a curiosity-driven motivation system will essentially be able to create arbitrary value systems over time. The solution to this is simply to scale the magnitude of any "curiosity rewards" to be smaller than a hard-wired reward for avoiding harm. Third, a machine that can change its own code might change its motivations into harmful ones. Again, hard-coding a pain signal for any code modifications would help combat the problem. Further, if any critical code or hardware is modified, the whole machine could shut itself down. Of course, malignant or careless programmers might build intelligent machines with harmful motivations, but this is a separate issue from statement #3 above.

Tuesday, July 11, 2006

July Update

I've been pretty busy with my internship this summer, so I haven't done much work with Verve. It's probably best that I wait till I'm done at IBM, anyway, to avoid any conflicts of interest. The internship itself is going well. I'll post more details when it's over... maybe after we've submitted some things to be published.

Next week is the AAAI 2006 conference in Boston. I'll be there to present a poster in the student abstract program. My abstract is entitled, "Curiosity-Driven Exploration with Planning Trajectories." More details are here.

Wednesday, June 21, 2006

Some Biological Evidence of Curiosity Rewards

I can't stop thinking about curiosity rewards and how interesting the whole concept is. It seems like such a fundamental component of mental development. I can't wait to implement the more powerful model in Verve and run some experiments.

I just came across this article (referring to an article in the latest issue of American Scientist, which I couldn't find) describing some recent biological evidence of a sort of curiosity reward system: http://www.physorg.com/news70030587.html

The original (not free) article is in American Scientist... here is the first paragraph and a good diagram: http://www.americanscientist.org/template/AssetDetail/assetid/50770

Evidently, there is a "25-year-old finding that mu-opioid receptors – binding sites for natural opiates – increase in density along the ventral visual pathway..." These receptors are said to be involved in pain/pleasure processing elsewhere. Based on this evidence, Irving Biederman ran experiments using fMRI on humans viewing various types of images. "Biederman also found that repeated viewing of an attractive image lessened both the rating of pleasure and the activity in the opioid-rich areas."

It seems that the evidence described here might correspond to the more powerful model of curiosity, which says that curiosity rewards are proportional to the reduction in novelty. So, once the stimulus is fully predicted (and no longer novel), the rewards will disappear.

Here is another good quote: "The brain's craving for a fix motivates humans to maximize the rate at which they absorb knowledge."

Thursday, May 04, 2006

"Building Gods" Video, Robot Limits, and Modifying Our Reward Systems

I just watched this video documentary called "Building Gods." It is a "rough cut to the feature film about AI, robots, the singularity, and the 21st century." I thought it was pretty well made and had a good selection of speakers. It's 1 hr 20 min long. Watch it if you have time:

http://video.google.com/videoplay?docid=1079797626827646234

Some parts talked about how we will need to figure out how to limit AI in some ways. We don't want our superintelligent "artilects" to turn against us, of course. I've thought about this problem often in terms of reinforcement learning. If we build intelligent agents with some sort of reinforcement learning/motivation/drives/goals (which seems to me like the best way to achieve general purpose intelligent agents), we can put limits on how rewarding certain situations are. We can hard-wire reward/pain signals corresponding to the equivalent of Asimov's 3 laws of robotics, for example. Since the motivation system would determine action selection, it could be designed so that bad behavior is painful to the agent. Even with a curiosity reward system (which presumably causes a lot of misbehavior in children), rewards from "interesting" situations could be scaled down if they are harmful to humans.

Rather than hard-wire all kinds of reward signals, there could be just a couple of them related to human approval. The reinforcing approval and disapproval received from a human teacher could be scaled very high compared to other rewards. It would be like training a pet. A well-trained dog is pretty reliable, after all. I think it could be even more effective for a robot. If your housecleaning robot decides to break a vase just to see what would happen, a verbal scolding (which becomes a large pain signal to the robot internally) would ensure that it never happened again. And all of this training would only be necessary once - the learned associations could be transferred to other robots.

A more interesting issue is whether intelligent agents should be allowed to change their own hardware and software. If so, things could get messy. In hope of achieving a higher long-term reward (which is the basis for reinforcement learning), they might modify their internal reward system and overwrite the hard-wired limits. They could remove the reward/pain system associated with Asimov's 3 laws. Everyone knows what happens next. The only good solution I see would be to make it painful for a robot to try to modify itself (or to even think about doing so).

Speaking of modifying reward systems, I think this is an interesting possibility for humans, too, which I've never heard discussed. Futurists always talk about the amazing possibilities down the road, like fully immersive VR, but they usually assume that our brains' reward systems will stay the same. Most of the things predicted in futuristic utopian societies (like vast scientific and technological advancements, upgraded bodies and minds, extensive space exploration, automated everything, etc.) are only rewarding if our brains say so.

What if we could change the hard-wired reward signals within ourselves? What if a person could be altered so that the mere sight of a stapler was the most pleasurable sensation possible? Sounds kind of boring... but so does taking drugs. Why would injecting heroine into your bloodstream be fun? Watching a person do so doesn't look fun. It's fun because it hijacks the brain's reward system, making the behavior much more desirable than other things. So, the problem with the stapler addict would be the same: normal daily activities would lose their appeal, and even the stapler rewards wouldn't be so great after a while. He/she would move on to two staplers, then three, then five, then ten... So goes the hedonic treadmill.

The problem is that the brain quickly adapts to whatever level of reward it receives. People that win the lottery aren't really much happier than the rest of us, once the euphoria wears off. So is there any point in modifying the brain's reward signals? I still think so. The solution isn't merely to change what's rewarding, but to modify the reward processing algorithm itself. It could be made so that it does not adapt to (i.e. get bored with) a given rewarding situation. Or it could just reset itself after a while. So every January 1st, your brain resets the reward adaptation system, and everything from the previous year that has become boring is suddenly exciting again.

Even more extreme, we could just set the internal reward signal (like the TD error-like signal encoded by midbrain dopamine neurons) to be maximal all the time. It would never adapt, like it does with drug addicts. A person in that state would be high constantly and never come down. He/she wouldn't even need fully immersive VR. What would be the purpose? At that point the hedonic treadmill would effectively be defeated.

Ok, admittedly, that would be a twisted existence... but from the point of view of the person with the altered brain, it would be great. Assuming the reward hypothesis is true for humans, I wouldn't be surprised if it actually happens someday. Disclaimer: I'm not recommending that anyone actually try this, either now or in the future. I just think it's a really interesting idea that should be discussed more among futurists. And I have never taken illegal drugs.

Friday, April 28, 2006

HCI Forum 2006 w/ Ray Kurzweil

Here are a couple of pictures from the HCI Forum 2006. One is of Ray Kurzweil trying out my demo application of a curious robot exploring a physically simulated environment. (I'll post pictures of it on the Verve website soon.) The other is of me talking about current models of curiosity in intelligent agents.



Tuesday, April 25, 2006

Curiosity System Changes

I've been meaning to make several changes to Verve's curiosity reward system for a while. Right now it still uses a simple method that only applies to very simple situations: it gives the agent a reward in situations that yield high prediction errors. The new system will only apply rewards in situations that contain learnable information (i.e. those that are neither too predictable nor too unpredictable). This works by making curiosity rewards proportional to the learning progress (i.e. reduction in prediction errors) in a given situation over time. This new method is inspired by the artificial curiosity research of Schmidhuber and Oudeyer.

So now I need localized representation of estimated learning progress. I currently have a localized representation of uncertainty/confidence. I think the changes will be fairly simple: I just need to add a new linear neural network (the "learning progress estimator") which gets input from the RBF state-action representation and outputs a scalar localized estimation of learning progress (i.e. reduction in uncertainty). This neural network will be trained with the actual reduction in uncertainty (measured using the change in uncertainty estimation, which is computed by measuring the uncertainty estimation before and after training it).

Finally, instead of being proportional to the prediction uncertainty estimation, curiosity rewards will be proportional to the estimated learning progress.

Internship This Summer at IBM Research

This summer I'll be doing an internship at IBM's TJ Watson lab in Yorktown Heights, NY. I'll be working with Charles Peck and the Biometaphorical Computing group, which is currently heavily involved in the Blue Brain project (see here and here). At this point, I think the plan is for me to work on a computational model of the cerebellum.

I'm not sure yet how much I'll be able to work on Verve in my spare time this summer because I don't know if it's too closely related to my work at IBM. I'll post again later when I find out.

Friday, March 31, 2006

Self-Organizing RBF Methods

Lately I've been thinking about various methods for self-organizing RBF centers. Instead of having RBF position fixed, they could move around over time using an unsupervised learning approach. This would focus resources on the most important parts of the input space.

Below I described a few different adaptation methods. "x" represents the distance between an RBF and the input point. I tested some of these ideas in a simple PyGame application and posted some screenshots below.

Method 1
Move the RBF closer to the input point in proportion to x. The farther away the RBF is from the input point, the faster it will move towards it. In other words, given input that stays in the same place over time, all the RBFs will reach it at the same time.

This method can be made more local by only considering RBFs within some radius from the input. The following image shows units being adapted towards the mouse pointer in my PyGame app. Adaptation is proportional to x.



You can see how all of the units beyond a certain radius do not move at all.

Method 2
If we adapt the RBF positions based on a Gaussian function...



...the RBFs will move towards the input point faster as they approach it. This is desirable because more distant points will not be affected as much. The following screenshot from my PyGame application demonstrates this using a Gaussian-based adaptation function. I moved the mouse pointer (which determines the input point) around in a smooth curve.



The problem with these approaches is that the RBFs tend to clump together near the inputs... which is fine if your goal is to reach a set of discrete clusters, but not if you're trying to approximate some function. In my case, I'm usually trying to represent a state space for a physically-simulated creature, so I don't want the RBFs to clump together so much. I do want them to move towards the input points (presumably increasing the representation's resolution where it matters), but not so much that they're totally covering the exact same area.

Method 3
I tried a hybrid approach which combines the simple linear function of Method 1 and the Gaussian function of Method 2. By simply multiplying the Gaussian and linear functions, I got the following:




This looks like it would help because the RBFs adapt slowly at large distances, quickly at medium distances, then slow down again as they approach the input. This is, in fact, what it does, but I still get the clumping effect I was trying to avoid. The following picture shows the PyGame app using this hybrid approach. It's hard to tell the difference between this and the plain Gaussian method just by looking at the picture; the main difference is in how quickly the RBFs approach the input.





So now I have a few methods to self-adapt RBF centers. (Of course, these have probably already been described elsewhere, but sometimes it's fun and enlightening to figure these things out yourself.)

The problem remains that the RBFs clump too much. What I need is a good way to keep them a minimum distance apart. A brute-force approach would be to compute the distances between each pair of RBFs, but that's O(n^2) runtime complexity. The classic Kohonen's self-organizing map approach would probably work. To implement that I would simply need to add connections between nearby RBFs, and I could use those local connections to force them apart if they get too close.

Wednesday, March 29, 2006

GDC 2006 Summary

This is a summary of my GDC 2006 experience. Keep in mind that I attended only ~15 out the the several hundred lectures available, and most of the ones I attended were in the programming track.

Overall
GDC 2006 was a continuation of last year's topics. Compared to last year, there was nothing terribly revolutionary (which was to be expected... there's already enough for people to learn without having to worry about, say, yet another hardware platform). With all the new hardware coming out, most of the focus was on preparing developers for the transition to parallel processing. Overall, it was a great conference; I think the GDC is one of the most important conferences available for people interested in real time computer graphics, simulated physics, AI, software development techniques, 3D modeling, etc.

In the game programming world, there was more of a push towards parallel architectures and algorithms. This change started last year with the introduction of new hardware (the Cell chip for the PS3, Ageia's PhysX chip, and multiple core chips for PCs).

Sessions
I attended the following sessions. For some of these I was working as a conference associate; the rest I attended for my own interest. More info on these can be found on the GDC 2006 site.

  • A day long tutorial, "Embodied Agents in Computer Games," by John O'Brien and Bryan Stout.
  • A roundtable discussion, "Technical Issues in Tools Development," moderated by John Walker.
  • A keynote speech, "Building a Better Battlestar," by Ronald D. Moore.
  • "The Next Generation Animation Panel," by Okan Arikan, Leslie Ikemoto, Lucas Kovar, Julien Merceron, Ken Perlin, and Victor Zordan.
  • "High Performance Physics Solver Design for Next Generation Consoles," by Vangelis Kokkevis. This included a demo of 500,000 particles running in real time on the PS3.
  • "Sim, Render, Repeat - An Analysis of Game Loop Architectures," by Michael Balfour and Daniel Martin.
  • The Nintendo keynote speech, "Disrupting Development," by Satoru Iwata. They handed out several thousand free copies of Brain Age for the Nintendo DS.
  • "Serious Games Catch-Up," by Ben Sawyer.
  • "The Game Design Challenge: The Nobel Peace Prize," by Cliff Bleszinski, Harvey Smith, Keita Takahashi, and Eric Zimmerman.
  • "Spore: Preproduction Through Prototyping," by Eric Todd.
  • "Half Weasel, Half Otter, All Trouble: A Postmortem of Daxter for the Sony PSP," by Didier Malenfant.
  • "Physical Gameplay in Half-Life 2," by Jay Stelly.
  • "Behavioral Animation for Next-Generation Characters" (Havok sponsored session), by Jeff Yates.
  • "Crowd Simulation on PS3," by Craig Reynolds. Showed 10,000 fish interacting @60 fps on the PS3.

People
I made a few new connections this year. I talked to Steve Wozniak, Dan Goodman (who works with Kevin Meinert, a former VRAC researcher, at Lucasarts), Leslie Ikemoto, and a bunch of people from the conference associates group. I also talked with several people that I have met before, including John Walker (High Voltage Software), Mat Best (Natural Motion), and Ken Stanley.

Meeting with Leslie Ikemoto at GDC 2006

At the GDC last week I met Leslie Ikemoto at a tutorial on embodied agents in video games. She brought up the Nero game during the tutorial, which uses evolved neural networks (specifically, the NEAT algorithm, invented by Ken Stanley) to control soldiers in a real-time strategy game. Since I had used NEAT before and am familiar with Ken Stanley and NEAT, I talked to her briefly during a break.

The next day we talked again about our own research. She showed me a project where she trained an animated character to run around on a platform, seeking goals and avoiding barriers, using reinforcement learning. The character's trajectory was determined by the animation sequence, so the RL part learned to choose the best animation to perform at any given state. Overall, it worked pretty well. She was a little unhappy with the results when the character seemed to wander around idly, even though there was a clear path to the goal. My best guess was that the state space needed higher resolution. I also suggested using a clustering algorithm to group the sensors into the most important parts of the state space.

It was very cool to see RL applied to character animation like this. I think it is a very powerful approach, especially in situations where the character must adapt to changing environments. Hopefully we'll see more of this in the future.

Leslie's colleague, Okan Arikan, whom I also met at the GDC, has done some pretty cool computer graphics research. Check out these explosions!

Monday, March 27, 2006

How Much of the Brain is Understood?

A few days ago, right after talking to Steve Wozniak, my friend Ken Kopecky and I were talking about how much of the brain is understood. He asked me roughly how much of it I thought I understood (which is sort of an ill-posed question, but anyway...), and I said maybe 50%. He replied, "That's a very bold statement, Tyler Streeter." I said, "Ya, I know."

The next day I thought more about that conversation. I realized that I should have qualified my response a bit. I talked to Ken again, saying that I was mainly referring to the brain's functional/computational aspects. I said, "My 50% estimation from the other day was not meant to imply that I am especially intelligent, but that the functional, computational aspects of the brain are not as complex as most people think."

To elaborate a bit, some of the key elements (also described in my notes posted earlier) are probably:
* Data compression/feature extraction (e.g., principal components analysis, clustering)
* Reinforcement learning (e.g., temporal difference learning)
* Planning/imagining (e.g., using a learned predictive model of the world to enable reinforcement learning from simulated experiences)
* Curiosity rewards (e.g., intrinsic rewards from novel, learnable information)
* Temporal processing/short-term memory (e.g., tapped delay lines)

Wednesday, March 22, 2006

Meeting Steve Wozniak at GDC2006



I met Steve Wozniak (cofounder of Apple) today at GDC2006. I showed him some videos of my research, which he enjoyed. He was very eager to meet all of the Conference Associates (the volunteer group of which I am a member).

We talked briefly about the brain and what we currently understand about it. We disagreed about how much is known about the brain. It was a good time.


Friday, March 17, 2006

My Notes on Theoretical and Biological Intelligence

Over the past year I've been keeping a couple of text files that summarize what I know about biological intelligence (from a neuroscience perspective) and theoretical models of intelligence. I am constantly updating these files as I gain more knowledge. I thought it would be good to post them here, just to have a record of what I understood on March 17, 2006.

biological_intelligence_3-17-06.pdf
theoretical_intelligence_3-17-06.pdf

Monday, March 13, 2006

What Does the Cerebellum Do?

The cerebellum automates motor tasks. It learns the temporal relationships between sensory and motor signals. After much practice, motor tasks get transferred to the cerebellum. In other words, well-learned tasks get chunked together as motor programs in the cerebellum. More specifically, these are probably closed-loop policies/options, as defined in the reinforcement learning literature. This whole process frees other areas (in the cerebral cortex) to focus its attention on new tasks, building upon the automatic skills stored in the cerebellum. It enables agents to learn hierarchies of motor skills.

Friday, March 10, 2006

Functions of Predictive Models

What are predictive models good for? Here's what I think:
  • Planning - Without an accurate predictive model, it's impossible to generate simulated experiences. Planning requires a predictive model.
  • Curiosity - The only models of curiosity I've seen so far depend upon prediction. Curiosity is defined as novelty or, even better, the reduction in novelty over time. The only good way to measure novelty in a general way is by comparing the output of a predictive model with reality. This could include a reflexive, metapredictor component that can predict a level of uncertainty about its own predictions.
  • Attention - I think attention is drawn to unpredicted, novel situations (and novelty measurement depends upon predictive models... see Curiosity above). In other words, the limited attention channel is focused on situations that contain the highest probability of containing useful information. These can be externally driven (unpredicted external stimuli) or internally driven (through planning/simulated experiences, we might come to a novel situation which attracts our attention and may guide external movement toward that situation in the real world).
Note that, in any kind of general purpose intelligent agent, predictive models must be learned from experience.

Monday, February 27, 2006

What is Rewarding?

So, assuming the brain is trying to maximize long-term reward intake... what is a "reward?" I think it can be arbitrarily defined, of course, but biological brains have a set of hard-wired primary reward/reinforcement signals. (I am using "reward" and "reinforcement" interchangeably here, with the assumption that both can represent positive and negative values.) The most obvious ones are: food, sex, and pain. Incoming sensory information that is classified into these categories generates reward signals.

A more interesting hypothetical reward signal comes from novelty. For now, let's call it "novelty rewards." We are drawn to novel situations because of novelty rewards. There is evidence that the same dopamine neurons that fire during unexpected "normal" rewards (e.g., food) also fire in all kinds of novel situations. The book Satisfaction: The Science of Finding True Fulfillment, by Gregory Berns, talks extensively about novelty being a major source of rewards in humans.

But I think there's more to it than simple novelty. If we were rewarded simply by experiencing novel situations, we would be drawn to all kinds of random signals (e.g., static radio signals) like moths to a street lamp. As Juergen Schmidhuber proposed in the early 1990s, a better model of novelty-based rewards is that it is rewarding to experience the reduction in uncertainty over time (see Juergen's papers on curiosity here: http://www.idsia.ch/~juergen/interest.html). Basically, it is rewarding to learn. I'll call this "learning rewards." This whole idea is based on our predictions about the world. The term "novelty," then, means the same thing as "uncertainty" and "unpredictability." Fortunately for computational modelers, uncertainty can easily be quantified using predictive models.

If an agent remains in a situation where predictable information can be gained over time, we can say the following:
  1. Uncertainty/novelty/unpredictability is reduced
  2. Learning occurs (specifically, the predictive model gets better)
  3. "Learning rewards" should be given to the agent
These learning rewards are extremely powerful. They enable very open-ended learning where agents are drawn to situations that contain learnable, predictable information. It drives them to experiment at the edge of their current knowledge of the world. Since it drives agents to improve their predictive models, it is very beneficial for planning.

Subjectively, it feels rewarding to gain information. When you are working on a crossword puzzle, and suddenly the "aha!" moment hits, it's the resulting transition (from uncertainty to certainty) that feels good.

The Reward Hypothesis

In general, I think the core idea that makes biological brains understandable is rewards. The following "Reward Hypothesis," as described on the RLAI research group's site (http://rlai.cs.ualberta.ca/RLAI/rewardhypothesis.html), summarizes my point:

"That all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)."

I think this is a clear description what the brain is "trying to do." All other major brain functions make the task of reward maximization easier.

Tuesday, January 31, 2006

Hierarchical representations

I've been reading from Principles of Neural Science (2000) by Kandel, Schwartz, and Jessell. (It's the required text for my Neural Basis of Human Movement class.) It's a big, pretty standard textbook. So far I've gained a lot of info on how the motor control centers are organized, something which my previous class (Neurobiology) didn't touch.

The main underlying idea I keep coming back to is hierarchy (both in terms of sensation and action). It makes so much sense. Low-level sensory inputs are combined into higher-level representations. In the cerebral cortex this occurs within modalities at first. Higher up in the hierarchy it occurs across modalities. So the highest levels of the sensory hierarchy represent very abstract states, combining all sensory input. Most of this seems to occur in the occipital, parietal, and temporal lobes.

Similarly, the spinal cord, brainstem, and parts of the frontal cortex appear to form a motor hierarchy. Currently, my best guess is that this hierarchy is arranged as follows, from lowest to highest level: spinal cord, brainstem, primary motor cortex (MC1), premotor cortex, supplementary motor area, presupplementary motor area, prefrontal cortex. If the prefrontal cortex is at the highest level, it might form an interface between the sensory and motor hierarchies... if they are totally separate structures (see next paragraph).

One of the big confounding issues for me is whether there are separate sensory and motor hierarchies. It might be possible that there is a single sensory-motor hierarchy. This would mimic the idea of hierarchical policies in reinforcement learning; each module in the hierarchy would be a mapping from state to action, so it would require both sensory and motor pathways.

So it might be a while before I (or anyone) understands the overall organization of the sensory and motor systems. The method of thinking that has been most help to me in this kind of situation is this: pick a hypothesis and assume it's true until proven wrong. The brain's overall organization is just so poorly understood that it's easy to get overwhelmed. I enjoy at least having a hypothetical model as a framework.

I plan to start working on an implementable hierarchical model soon. It might use radial basis functions like the current Verve agents do, but maybe not. It would be nice to have a totally self-developing hierarchy that could split nodes and add more resolution in the high-information parts of the state-action space. This is getting a little hard to picture, though, so I might make some drawings...

Wednesday, January 25, 2006

Verve 0.1.0 Release

I put a 0.1.0 source release on SourceForge last night. This is the "thesis" version that was used to generate the results in my MS thesis. Enjoy!

Tuesday, January 24, 2006

Pendulum Swing-Up and Cart-Pole Videos

I posted two videos: one of the pendulum swing-up task, and one of the cart-pole task. Check them out here: http://verve-agents.sourceforge.net/gallery

Version 0.1.0 should be ready within the next few days.

Friday, January 06, 2006

First software release soon

I'm planning on releasing an initial version of the Verve library soon. It will probably be the version I used for my master's thesis experiments. Even though I have lots of ideas for further development, I think I'll just release the "thesis version," mainly to have a snapshot of the software at the time I wrote the thesis. Not sure if I'll even reimplement the XML loading and saving code (which worked a looong time ago but got totally outdated after a ton of architecture changes... so I just commented it out). So look for something like version 0.1.0 soon.

Also, I've recently registered the project on SourceForge as 'verve-agents.' So now the main project website is http://verve-agents.sourceforge.net. (For the record, the old site was http://www.vrac.iastate.edu/~streeter/verve/main.html.) The new site is powered by the PHP-based Dokuwiki software, making it much easier for me to manage than a bunch of HTML files.