Last month I mentioned that I was running a classical conditioning experiment on my new agents. Given some stimulus at time t and a reward at time t+ISI (where ISI is a constant "inter-stimulus interval"), the agent should learn to predict when the time and magnitude of the reward after experiencing the stimulus. So, in other words, after the agent has been trained to predict the reward, when it sees the stimulus (which was previously neutral but now predicts reward), it should now exactly how long to wait before getting the reward and how big the reward will be.
My current problem is that the agent doesn't have a good way to represent the passage of time, a crucial element in classical conditioning. The agent can predict rewards immediately following the stimulus, but if there is a significant gap (ISI) between the stimulus and reward, the agent quickly forgets about the stimulus. I think I'm about to find some answer's in Nathaniel Daw's PhD thesis. Daw finished his thesis at Carnegie Mellon in 2003. The title of it is "Reinforcement Learning Models of the Dopamine System and their Behavioral Implications." It contains a lot of good material, including an extensive literature review and some new ideas about long-term reward predictions. Anyway, I'm just getting into chapter 4 which deals with time representation. I'm hoping to find some ideas there.
No comments:
Post a Comment