Hi ,
Tomorrow is Groundhog Day, and what better way to celebrate than to watch (or rewatch) the Bill Murray movie of the same name - which also happens to be a fantastic example of reinforcement learning.
Reinforcement learning is learning through experience, which is exactly what Bill Murray's character does in Groundhog Day.
For those readers who haven't seen the movie in the 30+ years since its release (*Spoiler Alert*), while on assignment in Punxatawney Pennsylvania, home of the titular groundhog,
Murray is caught in a time-loop and forced to relive February 2nd again and again. Fans of the movie estimate Murray relives the day over 12,000 times (equivalent to around 33 years) before he escapes the loop.
With reinforcement learning, an agent (i.e. computer) also interacts with an environment again and again - often engaging in thousands of "episodes" before completing its
training.
The actions of Murray in Groundhog Day can be divided into various stages. At first, Murray explores the world where he is trapped and exploits the knowledge he gathers to manipulate it to his advantage. However, after realising this approach is only leading to misery and despair, Murray ultimately turns his attention to gaining the knowledge and skills needed to
make himself a better person, and uses this knowledge to benefit others.
The training of a reinforcement learning agent can also be divided into stages: exploration, in which the agent interacts with the environment to gain knowledge of it, and exploitation, where the agent uses what it has learned to maximise its reward.
Groundhog Day ultimately ends with Murray living the perfect day. He wakes to find it is now February 3rd and he has finally escaped the time loop.