Regret-based Elicitation of Rewards for Sequential Decision Problems

Speaker: Kevin Regan, University of Toronto

Traditional methods for finding optimal policies in stochastic, multi-step decision environments require a precise model of both the environment dynamics and the rewards associated with taking actions and the effects of those actions. In practice it is often a complex and time-consuming process to precisely specify these rewards. This talk will cast the problem of specifying rewards as one of preference elicitation. We will first discuss how robust policies can be computed for Markov Decision Processes given partial reward information using the minimax regret criterion. We will then show how regret can be reduced by efficiently eliciting rewards information using bound queries. Regret-based elicitation of reward offers an efficient way to produce desirable policies without resorting to the precise specification of the entire reward function, and as such, opens up a new avenue for the design of everything from personal software agents to controllers for industrial scheduling problems.