I've been meaning to make several changes to Verve's curiosity reward system for a while. Right now it still uses a simple method that only applies to very simple situations: it gives the agent a reward in situations that yield high prediction errors. The new system will only apply rewards in situations that contain learnable information (i.e. those that are neither too predictable nor too unpredictable). This works by making curiosity rewards proportional to the learning progress (i.e. reduction in prediction errors) in a given situation over time. This new method is inspired by the artificial curiosity research of Schmidhuber and Oudeyer.
So now I need localized representation of estimated learning progress. I currently have a localized representation of uncertainty/confidence. I think the changes will be fairly simple: I just need to add a new linear neural network (the "learning progress estimator") which gets input from the RBF state-action representation and outputs a scalar localized estimation of learning progress (i.e. reduction in uncertainty). This neural network will be trained with the actual reduction in uncertainty (measured using the change in uncertainty estimation, which is computed by measuring the uncertainty estimation before and after training it).
Finally, instead of being proportional to the prediction uncertainty estimation, curiosity rewards will be proportional to the estimated learning progress.
No comments:
Post a Comment