Towards Global Reinforcement Learning
Speaker: Milen Pavlov
Sequential decision making under uncertainty is a ubiquitous problem.
In everyday situations we are faced with a series of decisions that
aim to maximize the probability of achieving some goal. However,
decision outcomes are often uncertain and it is not always immediately
evident how to determine if one decision is better than another. The
Reinforcement Learning framework overcomes this difficulty by learning
to make optimal decisions based on interactions with the environment.
One drawback of Reinforcement Learning is that it requires too much
data (interactions) to learn from scratch. For this reason, current
approaches attempt to incorporate prior information in order to
simplify the learning process. However, this is usually accomplished
by making problem-specific assumptions, which limit generalizability
of the approaches to other problems. This thesis presents the first
steps towards a new framework that incorporates and exploits a broad
prior knowledge in a principled way. It uses Constraint Satisfaction
and Bayesian techniques to construct and update a belief over the
environment, as well as over good decisions. This allows for
incorporating broad types of prior knowledge without limiting
generalizability. Preliminary experiments show that the framework's
algorithms work well on toy problems in simulation and encourage
further research on real-world problems.