Communications and Signal Processing Seminar

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

Chi JinAssistant Professor of Electrical and Computer EngineeringPrinceton University

WHERE:

Remote/Virtual

WHEN:

Thursday, September 16, 2021 @ 4:00 pm - 5:00 pm
This event is free and open to the publicAdd to Google Calendar

ABSTRACT: Modern reinforcement learning (RL) commonly engages practical problems with large state spaces, where function approximation must be deployed to approximate either the value function or the policy. While recent progresses in RL theory address a rich set of RL problems with general function approximation, such successes are mostly restricted to the single-agent setting. It remains elusive how to extend these results to multi-agent RL, especially in the face of new game-theoretical challenges. This talk considers two-player zero-sum Markov Games (MGs). We propose a new algorithm that can provably find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension—a new complexity measure adapted from its single-agent version [26]. A key component of our new algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness. Our theoretical framework is generic, which applies to a wide range of models including but not limited to tabular MGs, MGs with linear or kernel function approximation, and MGs with rich observations.

BIO: Chi Jin is assistant professor of Electrical and Computer Engineering at Princeton University. He obtained his Ph.D. in Computer Science at UC Berkeley, advised by Michael I. Jordan. He received his B.S. in Physics from Peking University. His research interest lies in theoretical machine learning, with special emphases on nonconvex optimization and reinforcement learning. His representative work includes proving noisy gradient descent / accelerated gradient descent escape saddle points efficiently, proving sample complexity bounds for optimistic Q-learning / Least-squares value iteration, and designing near-optimal algorithms for minimax optimization.

Join Zoom Meeting https://umich.zoom.us/j/92211136360

Meeting ID: 922 1113 6360

Passcode: XXXXXX (Will be sent via e-mail to attendees)

Zoom Passcode information is also available upon request to Katherine Godwin ([email protected]).

See full seminar by Prof Jin

Faculty Host

Lei YingProfessor of Electrical Engineering and Computer ScienceUniversity of Michigan

Events

Communications and Signal Processing Seminar

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

Faculty Host