Communications and Signal Processing Seminar

Rethinking the theoretical foundation of reinforcement learning

Nan JiangAssistant Professor of Computer ScienceUniversity of Illinois at Urbana-Champaign
3427 EECS BuildingMap

Abstract: Given two candidate functions, can we identify which one is the true value function of a large Markov decision process (MDP), given a “benign” dataset? Trivial as it might seem, a version of the question was open for 30+ years in reinforcement learning (RL), and the core difficulties are intimately related to the training instability of modern deep RL. In this talk, I will argue that by rethinking fundamental questions like this, RL theory can provide unique perspectives and solutions to practically relevant problems that are critical to the deployment of RL in real-world scenarios. The first part of the talk concerns holdout validation in offline RL, where the aforementioned question naturally arises. I will show how our algorithm, Batch Value-Function Tournament (BVFT), breaks the theoretical barrier and enjoys promising empirical performances. The second part of the talk is about offline training: when we learn policies from a pre-collected dataset, how to reason about policies that would visit states not seen in the data and avoid over-estimation? I will present the Bellman-consistent pessimism framework, whose extension gives a surprising unification of offline RL and imitation learning.

Bio: Nan Jiang is an assistant professor of Computer Science at the University of Illinois at Urbana-Champaign. Prior to joining UIUC, he was a postdoc researcher at Microsoft Research NYC. He received his Ph.D. in Computer Science and Engineering at the University of Michigan. His research interests lie in the theory of reinforcement learning. Specific research topics include sample complexity of exploration under function approximation, offline RL and evaluation, learning in partially observable systems, etc. He is the recipient of the Best Paper Awards in AAMAS 2015 and ICML 2022, Adobe Data Science Award in 2021, and the NSF CAREER Award in 2022.

*** This event will take place in a hybrid format. The location for in-person attendance will be room 3427 EECS. Attendance will also be available via Zoom.

Zoom Passcode information is available upon request to Sher Nickrand (


Faculty Host

Lei YingProfessor, Electrical Engineering and Computer ScienceUniversity of Michigan, Electrical and Computer Engineering