Communications and Signal Processing Seminar

Deep Exploration in Reinforcement Learning: from Thompson Sampling to Randomized Value Functions

Daniel RussoAssistant ProfessorColumbia University Business School
SHARE:

Modern online marketplaces feed themselves: they rely on historical data to optimize content and user-interactions, but it's the data generated from these interactions that is fed back into the system and used to optimize future interactions. As this cycle continues, good performance requires algorithms capable of learning through sequential interactions, systematically experimenting to gather useful information, and balancing exploration with exploitation.

This talk will discuss some recent progress toward reliably efficient exploration in reinforcement learning systems. I'll first discuss Thompson sampling, an algorithm that has recently been the focus of much attention in academia and industry. Then, building on the idea of Thompson sampling, I'll introduce a new approach that generates sophisticated exploration by randomly perturbing value function estimates. This approach can be combined with common reinforcement learning algorithms, such as such as least-squares value iteration and temporal difference learning, which do not maintain a full model of the environment and instead aim to learn a parameterized representation of the value function.
Daniel Russo is an assistant professor in the Decision, Risk, and Operations division of the Columbia Business School. His research lies at the intersection of statistical machine learning and sequential decision-making, and contributes to the fields of online optimization, reinforcement learning, and sequential design of experiments. He joined Columbia in summer 2017, after spending year as an assistant professor in the MEDS department at Northwestern's Kellogg School of Management and one year at Microsoft Research in New England as Postdoctoral Researcher. He received his PhD from Stanford University in 2015, under the supervision of Benjamin Van Roy. In 2011, he received his BS in Mathematics and Economics from the University of Michigan.

Sponsored by

ECE-Systems

Faculty Host

Vijay Subramanian