Communications and Signal Processing Seminar

Primal-Dual Pi Learning Using State and Action Features

Mengdi WangAssistant ProfessorORFE, Princeton University

We survey recent advances on the complexity and methods for solving Markov decision problems (MDP) and Reinforcement Learning (RL) with finitely many states and actions – a basic mathematical model for reinforcement learning.
For model reduction of large scale MDP in reinforcement learning, we propose a bilinear primal-dual pi learning method that utilizes given state and action features. The method is motivated from a saddle point formulation of the Bellman equation. The sample complexity of bilinear pi learning depends only on the number of parameters and is variant with respect to the dimension of the problem.

In the second part we study the statistical state compression of general Markov processes. We propose a spectral state compression method for learning the state features from data. The state compression method is able to " sketch" a black-box Markov process from its empirical data and output state features, for which we provide both minimax statistical guarantees and scalable computational tools.
Mengdi Wang is interested in data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi became an assistant professor at Princeton in 2014. She received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years), the Princeton SEAS Innovation Award in 2016, the NSF Career Award in 2017, and the Google Faculty Award. She is currently serving as an associate editor for Operations Research.

Sponsored by


Faculty Host

Vijay Subramanian