Off-policy Estimation in Reinforcement Learning
This event is free and open to the publicAdd to Google Calendar
In many real-world reinforcement learning applications, access to the underlying dynamic environment is limited to a fixed set of data that has already been collected, without information about the collecting procedure and additional interaction with the environment being available, which is usually called ‘behavior-agnostic off-policy’ setting. In this talk, we show that consistent and effective off-policy estimation remains possible in this scenario. Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical state-action distributions, derived from fundamental properties of the stationary distribution. In addition to providing theoretical guarantees, we present an empirical study of our algorithm applied to off-policy policy evaluation and find that our algorithm significantly improves accuracy compared to existing techniques.
Bo Dai is a research scientist in Google Brain. He is the recipient of the best paper award of AISTATS 2016 and NIPS 2017 workshop on Machine Learning for Molecules and Materials. His research interest lies in developing principled (deep) machine learning methods using tools from optimization, especially on reinforcement learning and representation learning for structured data, as well as various applications.