Sunday, November 07, 2004

November Update

Just a note to say that I'm still doing a lot of background research. I'm currently writing a review paper for a psych class I'm in. It will cover reinforcement learning in natural and artificial neural networks and why biologically-realistic algorithms are important. A lot of the papers I've been reading recently are from Nature Neuroscience and Nature Reviews Neuroscience.

Something my Verve code has been lacking is reward sensing/predicting by the neural network. After reading a few neuroscience papers dealing with reward signals in human brains, I think my artificial neural network should be able to: 1) sense "rewards" (kind of a vague term, but this will probably be programmed explicitly, as in neuralNet->addReward(0.5)), and 2) predict rewards (i.e. some special neurons should output a signal when rewards are expected to be given).

Monday, September 20, 2004

Research Update

Lately I've spent my time reading about neuroscience and biologically-realistic learning algorithms (and doing classwork, of course).

I skimmed through "Biophysics of Computation" by Chrisfof Koch. It describes the low-level electrical and chemical properties of neurons and synapses. Pretty good stuff. I especially like how he shows the similarities between certain neural structures and well-known electrical circuits. I'm also reading a bunch of articles out of "The Handbook of Brain Theory and Neural Networks" by Michael Arbib. (I have the 1st edition from the school library, but I'm trying to get the second edition since it's a lot newer and supposedly updated a lot.) It has a lot of good short articles on neural reinforcement learning and motor control.

I'm also reading Russell Smith's (the guy who wrote Open Dynamics Engine) PhD thesis, "Intelligent Motion Control with an Artificial Cerebellum." This contains a great overview of the human motor control system (especially the cerebellum) and other motion control research. He extends a system called the CMAC controller. I can't explain it yet since I just started reading it, but I think it'll be pretty good.

One more thing I'm doing that will probably pertain to Verve... I'm working with some other students on an abstract simulated physics interface (OPAL: Open Physics Abstraction Layer). This will be a high-level interface to physics engines. The purposes of this are: 1) to have an abstract interface that can be extended to work with any physics engine, and 2) to have a simple API that does tons of cool things automatically with relatively few function calls. Open Dynamics Engine, for instance, gives developers total control over a ton of parameters, but it takes a while to learn to use it well. Hopefully OPAL will alleviate this problem. I hope to use OPAL to make a lot of good Verve demos.

Tyler

Saturday, August 28, 2004

Self-Referential Reinforcement Learning

Lately I've been researching new reinforcement learning methods. Genetic algorithms have worked well so far, but I thought it might be good to see what else is out there.

I came across the website of Juergen Schmidhuber, a researcher at the IDSIA lab in Switzerland (www.idsia.ch). A lot of his work deals with "self-referential" learning systems. Instead of having a single hard-coded learning algorithm for some agent, you start with some initial learning "germ" that is able to modify any part of the agent's policy, including the learning system itself. This allows an agent to learn better strategies, to learn better learning strategies, to learn how to learn better learning strategies... An important feature of such a system is that there must be a closed loop somewhere (something like Hofstadter's "strange loops" he describes in Goedel, Escher, Bach). You can't simply have a meta-learning algorithm that only modifies the level below it; it must be totally self-referential to be able to change all parts of its learning strategy.

Obviously there must be some hard-coded aspects of the system. For example, if reinforcement is provided from the agent's environment, the agent shouldn't learn to misinterpret what's good and what's bad reinforcement; the agent should continually try to improve itself to meet a fixed goal.

My main concern is the problem of getting stuck in local optima. GAs can search through different chunks of the solution space in parallel, but this new method might not allow that. How do humans search through a solution space? We don't have multiple bodies that can try a ton of possible solutions in parallel, though something like this probably occurs in our minds. We create lots of hypotheses and test them against our mental model of the world, then test the best hypothesis against the real world. Predicting outcomes and measuring the actual outcomes against our predictions probably comes into play somewhere.

I've implemented two such self-referential learning systems so far, both modifying a character's neural net. One is sort of a dynamic programming approach, the other is totally neural net-based. The first didn't work so well, probably because it had to learn a sequential program to adjust a parallel architecture (the character's neural net). The second is a regular neural net with special output nodes that can address, read, and modify any parameter in the net. I'm still experimenting with this. One problem I'm having is that the network usually reaches a stable attractor and just stops changing: no character movement and no changes due to learning. I might add some probabilistic features to keep this from happening. For example, maybe there should always be a non-zero probability that the learning system can modify things at any time.

Monday, August 09, 2004

Connectionism...

I've been reading some articles on Wikipedia about language and philosophies of the mind. Here are some links:

http://en.wikipedia.org/wiki/Sapir-Whorf_Hypothesis
http://en.wikipedia.org/wiki/Loglan
http://en.wikipedia.org/wiki/Connectionism

I haven't thought about it much before, but I think I subscribe to the connectionist philosophy. I may be a little biased, though, having studied artificial neural networks a lot recently.

The following quote from the connectionism article describes one of the main differences between connectionism and computationalism:

“Connectionists engage in "low level" modelling, trying to ensure that their models resemble the anatomy of the human brain, whereas computationalists construct "high level" models that do not resemble neurological structure at all...”

I hope to develop a connectionist system with Verve using only biologically-plausible components; however, this might not be the most practical approach. At first I might have to use some "high level models" that the neural architecture can learn to use, making the system part connectionist and part computationalist. The end goal would be to use only a neural architecture to control all behaviors at various levels of complexity.

Monday, July 26, 2004

New Videos

I posted two videos of some almost-stable walking.  Go to the Gallery page on the Verve site to see them.

Tuesday, July 20, 2004

New Thoughts about Python

I met with Michael McLay (starship.python.net/crew/mclay) yesterday for a few hours.  We talked about Python and how I could use it in Verve.  
 
Here's what I'm thinking:  If I only needed a scripting language for the desired behavior files, I would probably stick with Lua; however, I'm thinking more and more about going back and writing most of Verve with a scripting language.  In this case I would probably choose Python since it has a larger community and a ton of existing libraries.  I need to reimplement NEAT anyway, so I might write it (except for the parts that need to be really fast) in Python, too.
 
Of course, I'll need to spend some time learning Python first...
 
Please post comments if you have an opinion on any of this.

Wednesday, July 14, 2004

Update

Since my last post I have added a lot of little things. The most significant additions are:

1. I can now use NEVT (NeuroEvolution Visualization Toolkit http://nevt.sourceforge.net ) to generate SVG files from the neural networks and statistics files.

2. The desired behaviors are written in Lua script files. I'm still not sure if this is the best way to do things. Maybe users should just be able to decide how to implement the "fitness evaluation" functions on their own. They could have the choice of scripting them or deriving evaluator objects from a base class provided by Verve.

3. A high level Training Script XML file (for each behavior) contains info about evolution stages, among other things. It also contains the location of a Lua script containing evaluation functions for that behavior. It might still be better to have a single Lua script per behavior that specifies evolution stages and contains all the functions needed to evaluate the behavior.

Besides that, I have tweaked and rewritten chunks of the NEAT source code.

The main problem I have now is trying to get walking behavior easily. I can get decent results with "training wheels" (forces that keep the body from falling; see the Reil & Massey papers). Once I take off the training wheels, the fittest individuals die off because of the sudden change in the environment. I've tried gradually removing the training wheels, but I haven't had much better results yet.

My next big tasks are the following:

1. Get walking behavior to evolve more easily and reliably. This is such a fundamental behavior for simulated humans that I don't want to move to other behaviors until it works well.

2. Reimplement NEAT. I have been using Ken Stanley's NEAT source code for a while. The problem is that his code is under the GPL license, and Verve will use a less-restrictive license. There is a chance that he could change it for me, but that would probably be complicated since others have probably already used it in GPL-license projects. Also, implementing NEAT myself would give me an even better understanding of how the algorithm works.

Tyler

Saturday, June 26, 2004

Desired Behavior Modules & Lua

I've been thinking for a while about how to make the training system work smoothly with the desired behaviors. So far I've been creating a separate C++ class for each desired behavior. This works fairly well, but they usually take a while to tweak. It would be nice to have a separate script file for each desired behavior. So... I've been learning to use Lua. I think Lua scripts will work nicely for this purpose. I won't need anything too fancy, and it seems like Lua is getting pretty popular with game developers.

I hit a roadblock yesterday when I was thinking about how Lua would work with Verve. The users will have full control over their character models and virtual environments, but the desired behaviors (whether scripts or C++ functions/classes) will need to have full access to the character model and possibly the environment to decide how well the character's controller is doing.

The way I did this with my desired behavior C++ classes was this (using "walking" as an example): each frame/time step a pointer to the character would be passed to the Walking class. This class would examine the character's body (how far it had moved forward, how many steps it had taken) and update the fitness score. The problem with doing this in Lua scripts is that the users would have to create all the Lua/C++ bindings to let the script access the character and environment. This would work, but it makes the API uglier.

So I came up with this idea: the user will create a set of metrics to measure in the C++ code. For example, let's say the user is training a simulated robot to stack blocks. Each time step the user could call Robot->GetMetrics() and maybe Environment->GetMetrics(). These return the number of steps the robot has taken, how much energy he has used, the height of his head, the number of blocks stacked up in the environment, etc. Notice that some of the numbers might not be needed for all desired behaviors; they are just a bunch of number that might be used. Then, the numbers all get passed to the Stacking Blocks Lua script which measures only the number of blocks stacked and returns a fitness score. Another Lua script (Jumping) might use other numbers (the head height) to calculate a fitness score.

I'll test this method to see how well it works. I want the API to be very clean, and I don't want the users to have to write tons of extra code. To give them the freedom to use their own character code, though, they'll still have to do some extra work.

I think I may also give them the option of using Lua scripts or not. They may just want to write the desired behaviors in C++ anyway (maybe because they don't know/want to learn Lua or because the Lua scripts might slow down the training process).

XML files

A couple of days ago I changed most of the data files to use XML. I'm using the TinyXML parser to load a save these files. Specifically, the files that are now XML files are 1) the NEAT parameters file, 2) the "controller" files (neural networks), and 3) the "Training Script" file. Here's a quick description of each one:

1.) The NEAT parameters file holds all of the genetic algorithm parameters (population size, probabilities of various mutations, etc.).

2.) A controller file holds info about the neurons and connections that make up a neural network.

3.) The Training Script file is sort of a high level description of a training sequence. The data it contains include: the location of the NEAT parameters file, whether to seed the population with a trained controller, the location of that controller file, whether to save the trained controllers, whether to save statistics, the output directory for all output files, how often to save controller files (in terms of generations), how many generations should elapse before quitting, how many runs/repetitions of the whole training sequence to perform, the number of sensors and effector the trained controller should have, and whether the desired behavior is "oscillatory." This final parameter decides whether the neural nets should initially contain a lot of recurrent connections. Certain oscillatory behaviors (e.g. walking) need a lot of these from the start.

Also, the Training Script contains Training Stages (XML elements) that each describe a different stage in the evolution process (i.e. incremental evolution). Each stage can contain one or more desired behaviors and a fitness goal. For example, to train a controller to walk, the first Training Stage might use a "Oscillate Legs" desired behavior. Once the fitness of a controller reaches the fitness goal for that stage, the training system switches to the next training stage, which could be "Maximize Forward Distance" in the case of walking. Note that each stage can have more than one desired behavior. You could use a "Minimize Energy Expended" desired behavior along with a walking behavior to make the controller learn to walk while minimizing energy used.

One file still type still needs to be switched to XML: the statistics file. This file will contain information about fitness (best and average), neural net complexity (# of neurons and connections), and speciation.

By the way, one of the reasons I chose XML is because a guy named Derek James recently wrote a NeuroEvolution Visualization Toolkit - http://nevt.sourceforge.net This takes XML files (neural nets and statistics files) and converts them to SVG (scalable vector graphics) files. This should come in handy later.

Friday, June 25, 2004

Old news...

Here are the posts that I made on my old Verve news site before it was replaced by this blog:

Friday, June 18, 2004
Added a Bibliography page.

Thursday, June 17, 2004
An executive summary and technical white paper have been posted in the Documentation section.

Friday, June 11, 2004
The Autonomous Virtual Humans project has been restructured. The new system is much more general in that it will work with any physics engine and any character model, not just humans. The new project has been named Verve and has a new website: www.vrac.iastate.edu/~streeter/verve/main.html.

Verve blog is now online

I just created a new blog on blogger.com for the Verve project. I'll use this space in the future to keep track of ideas I have. Hopefully it will encourage potential Verve users to give me feedback, too. :)