The purpose of this function, as explained in the article, is basically the same as that of planning in reinforcement learning:
Moving around in the world exposes organisms to danger, so as a rule they should have as few experiences as possible and learn as much from each as they can. Although some of life's lessons are learned in the moment ("Don't touch a hot stove"), others become apparent only after the fact ("Now I see why she was upset. I should have said something about her new dress"). Time travel allows us to pay for an experience once and then have it again and again at no additional charge, learning new lessons with each repetition.
Concerning how often we switch into "planning mode":
Perhaps the most startling fact [... is ...] how often it does it. Neuroscientists refer to it as the brain's default mode, which is to say that we spend more of our time away from the present than in it.
I agree. And that's how agents in the Verve library are designed to work when in planning mode: for each "real" time step, the agent simulates many time steps (default=50, if I remember correctly) through the use of its learned predictive model.
No comments:
Post a Comment