Speaker: Pascal Poupart

In this presentation, I will talk about the oldest form of Reinforcement Learning (RL): Bayesian Reinforcement Learning (BRL). That's right, BRL was formalized in the 1960s by Howard and his students at MIT, well before Q-learning was developed or RL even became a subarea of Machine Learning in the 1980s. Ironically, BRL is now considered a new form of RL and is gaining popularity for a number of reasons. I will explain how BRL can be viewed as an optimal form of active learning in RL since it provides a principled solution to the classic exploration-exploitation tradeoff. Furthermore, BRL facilitates the explicit encoding of prior knowledge, which reduces the need for data to learn a good policy. On the other hand, BRL algorithms are computationally much more complex than classic RL algorithms. In the second part of my presentation, I will talk about recent advances that have shown how BRL leads to a special class of partially observable Markov decision process (POMDP) as well as an effective point-based value iteration algorithm called BEETLE.