Regret-based Elicitation of Rewards for Sequential Decision Problems
Speaker: Kevin Regan, University of Toronto
Traditional methods for finding optimal policies in stochastic,
multi-step decision environments require a precise model of both the
environment dynamics and the rewards associated with taking actions
and the effects of those actions. In practice it is often a complex
and time-consuming process to precisely specify these rewards. This
talk will cast the problem of specifying rewards as one of preference
elicitation. We will first discuss how robust policies can be
computed for Markov Decision Processes given partial reward
information using the minimax regret criterion. We will then show how
regret can be reduced by efficiently eliciting rewards information
using bound queries. Regret-based elicitation of reward offers an
efficient way to produce desirable policies without resorting to the
precise specification of the entire reward function, and as such,
opens up a new avenue for the design of everything from personal
software agents to controllers for industrial scheduling problems.