Towards Bayesian Reinforcement Learning
Speaker: Pascal Poupart
In this presentation, I will talk about the oldest form of
Reinforcement Learning (RL): Bayesian Reinforcement Learning (BRL).
That's right, BRL was formalized in the 1960s by Howard and his
students at MIT, well before Q-learning was developed or RL even
became a subarea of Machine Learning in the 1980s. Ironically, BRL is
now considered a new form of RL and is gaining popularity for a number
of reasons. I will explain how BRL can be viewed as an optimal form
of active learning in RL since it provides a principled solution to
the classic exploration-exploitation tradeoff. Furthermore, BRL
facilitates the explicit encoding of prior knowledge, which reduces
the need for data to learn a good policy. On the other hand, BRL
algorithms are computationally much more complex than classic RL
algorithms. In the second part of my presentation, I will talk about
recent advances that have shown how BRL leads to a special class of
partially observable Markov decision process (POMDP) as well as an
effective point-based value iteration algorithm called BEETLE.