Communications and Signal Processing Seminar

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous, Value-Based Reinforcement Learning Algorithms

Siva Theja MaguluriFouts Family Early Career Professor and Assistant ProfessorISYE @ Georgia Tech
WHERE:
Remote/Virtual
SHARE:

Abstract:  In this work, we develop a unified framework to obtain finite-sample and/or finite-time convergence bounds for various model-free, value based Reinforcement Learning (RL) algorithms. These algorithms are naturally asynchronous, and we view them as Markovian Stochastic Approximation (SA) algorithms to solve Bellman fixed-point equations. We develop a Lyapunov framework and obtain mean square error bounds on the convergence of a general class of Markovian SA algorithms. Based on this central result, we establish finite-sample bounds for Q-learning, n-step TD and TD(\lambda). As a by-product, we demonstrate a bias-variance trade-off, and theoretically explain the well-known empirical observation on efficiency of bootstrapping in TD(\lambda) and n-step TD. We also study off-policy TD algorithms including V-trace, which is used in Deepmind’s street learn project. The key proof techniques that we exploit are using a generalized Moreau envelope as a smooth potential/ Lyapunov function, and exploiting fast mixing of underlying Markov chain.

The talk is based on the following two papers:

Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam, “A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants,” Arxiv Version.

Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, and Karthikeyan Shanmugam, “Finite-Sample Analysis of Contractive Stochastic Approximation Using Smooth Convex Envelopes,” Neurips 2020, Arxiv Version.

Speaker Bio:  Siva Theja Maguluri is Fouts Family Early Career Professor and Assistant Professor in the H. Milton Stewart School of Industrial and Systems Engineering at Georgia Tech. Before that, he was a Research Staff Member in the Mathematical Sciences Department at IBM T. J. Watson Research Center. He obtained his Ph.D. and MS in ECE as well as MS in Applied Math from UIUC, and B.Tech in Electrical Engineering from IIT Madras. His research interests span the areas of Control, Optimization, Algorithms and Applied Probability. In particular, he works on Reinforcement Learning theory, scheduling, resource allocation and revenue optimization problems that arise in a variety of systems including Data Centers, Cloud Computing, Wireless Networks, Block Chains, Ride hailing systems, etc. He is a recipient of the biennial “Best Publication in Applied Probability” award in 2017, second place award at INFORMS JFIG best paper competition in 2020, “CTL/BP Junior Faculty Teaching Excellence Award” in 2020 and “Student Recognition of Excellence in Teaching: Class of 1934 CIOS Award” in 2020.

Join Zoom Meeting https://umich.zoom.us/j/97598571292

Meeting ID: 975 9857 1292

Passcode: XXXXXX (Will be sent via email to attendees)

Zoom Passcode information is also available upon request to Shelly Feldkamp ([email protected]).

See full seminar by Siva Theja Maguluri