I discovered yesterday that it really helps to connect the input neurons directly to the output neurons in the multilayer neural net. Now the value function neural net learns really well. The temporal difference error almost always goes to +/- 0.001 eventually.
Now I'm having trouble with the policy/actor neural net. It doesn't seem to improve its performance very much. It'll learn to swing the pendulum a little higher over time, but it never gets high enough to swing straight up. It definitely has enough torque, so that's not the problem. I wonder if it needs more random exploration. I'm currently using a low-pass filtered noise channel that changes slowly over time to encourage long random actions. I'll keep trying stuff...
No comments:
Post a Comment