Communications and Signal Processing Seminar

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent State

Ben Van RoyProfessorEE, MSE Stanford University

Abstract: We design a simple reinforcement learning agent that, with a specification only of suitable internal state dynamics and a reward function, can operate with some degree of competence in any environment. The agent maintains visitation counts and value estimates for associated state-action pair. The value function is updated incrementally in response to temporal differences and optimistic boosts that encourage exploration. The agent executes actions that are greedy with respect to this value function. We establish a regret bound demonstrating convergence to near-optimal per-period performance, where the time taken to achieve near-optimality is polynomial in the number of internal states and actions, as well as the reward averaging time of the best policy within the reference policy class, which is comprised of those that depend on history only through the agent’s internal state. Notably, there is no further dependence on the number of environment states or mixing times associated with other policies or statistics of history.

Speaker Bio:  Benjamin Van Roy is a Professor at Stanford University. His research focuses on reinforcement learning.  Beyond academia, he leads a DeepMind Research team in Mountain View, and has also led research programs at Unica (acquired by IBM), Enuvis (acquired by SiRF), and Morgan Stanley.  He is a Fellow of INFORMS and IEEE.

Related papers

Join Zoom Meeting

Meeting ID: 975 9857 1292

Passcode: XXXXXX (Will be sent via email to attendees)

Zoom Passcode information is also available upon request to Shelly (Michele) Feldkamp ([email protected]).

See full Seminar by Professor Van Roy