Saturday, August 28, 2004

Self-Referential Reinforcement Learning

Lately I've been researching new reinforcement learning methods. Genetic algorithms have worked well so far, but I thought it might be good to see what else is out there.

I came across the website of Juergen Schmidhuber, a researcher at the IDSIA lab in Switzerland (www.idsia.ch). A lot of his work deals with "self-referential" learning systems. Instead of having a single hard-coded learning algorithm for some agent, you start with some initial learning "germ" that is able to modify any part of the agent's policy, including the learning system itself. This allows an agent to learn better strategies, to learn better learning strategies, to learn how to learn better learning strategies... An important feature of such a system is that there must be a closed loop somewhere (something like Hofstadter's "strange loops" he describes in Goedel, Escher, Bach). You can't simply have a meta-learning algorithm that only modifies the level below it; it must be totally self-referential to be able to change all parts of its learning strategy.

Obviously there must be some hard-coded aspects of the system. For example, if reinforcement is provided from the agent's environment, the agent shouldn't learn to misinterpret what's good and what's bad reinforcement; the agent should continually try to improve itself to meet a fixed goal.

My main concern is the problem of getting stuck in local optima. GAs can search through different chunks of the solution space in parallel, but this new method might not allow that. How do humans search through a solution space? We don't have multiple bodies that can try a ton of possible solutions in parallel, though something like this probably occurs in our minds. We create lots of hypotheses and test them against our mental model of the world, then test the best hypothesis against the real world. Predicting outcomes and measuring the actual outcomes against our predictions probably comes into play somewhere.

I've implemented two such self-referential learning systems so far, both modifying a character's neural net. One is sort of a dynamic programming approach, the other is totally neural net-based. The first didn't work so well, probably because it had to learn a sequential program to adjust a parallel architecture (the character's neural net). The second is a regular neural net with special output nodes that can address, read, and modify any parameter in the net. I'm still experimenting with this. One problem I'm having is that the network usually reaches a stable attractor and just stops changing: no character movement and no changes due to learning. I might add some probabilistic features to keep this from happening. For example, maybe there should always be a non-zero probability that the learning system can modify things at any time.

Monday, August 09, 2004

Connectionism...

I've been reading some articles on Wikipedia about language and philosophies of the mind. Here are some links:

http://en.wikipedia.org/wiki/Sapir-Whorf_Hypothesis
http://en.wikipedia.org/wiki/Loglan
http://en.wikipedia.org/wiki/Connectionism

I haven't thought about it much before, but I think I subscribe to the connectionist philosophy. I may be a little biased, though, having studied artificial neural networks a lot recently.

The following quote from the connectionism article describes one of the main differences between connectionism and computationalism:

“Connectionists engage in "low level" modelling, trying to ensure that their models resemble the anatomy of the human brain, whereas computationalists construct "high level" models that do not resemble neurological structure at all...”

I hope to develop a connectionist system with Verve using only biologically-plausible components; however, this might not be the most practical approach. At first I might have to use some "high level models" that the neural architecture can learn to use, making the system part connectionist and part computationalist. The end goal would be to use only a neural architecture to control all behaviors at various levels of complexity.