Other Seminar

Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL

Andrew WagenmakerPostdoctoral Researcher, Electrical Engineering and Computer ScienceUC Berkeley
WHERE:
1303 EECS BuildingMap
SHARE:

Abstract: In order to mitigate the sample complexity of real-world reinforcement learning, common practice is to first train a policy in a simulator where samples are cheap, and then deploy this policy in the real world, with the hope that it generalizes effectively. Such direct sim2real transfer is not guaranteed to succeed, however, and in cases where it fails, it is unclear how to best utilize the simulator. In this talk, we take steps towards developing a principled set of techniques for such regimes. We consider two settings: realizable settings, where the simulator is able to accurately model the real environment with the correct (but initially unknown) simulator parameters, and unrealizable settings, where there may not exist any simulator parameters for which the simulator effectively models the real environment. We argue that in both settings, even when naive sim2real transfer fails, simulators can still be used to substantially speed up real-world RL by enabling efficient exploration. We show theoretically that this can lead to an exponential improvement in sample complexity as compared to learning without a simulator, and provide a variety of real-world robotic results demonstrating the effectiveness of our approaches in practice.

Bio: Andrew Wagenmaker is postdoctoral researcher at UC Berkeley working with Sergey Levine. Previously, he completed a PhD in Computer Science at the University of Washington where he was advised by Kevin Jamieson. He has also spent time at Microsoft Research, mentored by Dylan Foster, as well as the Simons Institute, and his work has been supported by an NSF Graduate Research Fellowship. His research centers on developing learning-based algorithms for decision-making in sequential environments. In particular, much of his work has focused on obtaining better-than-worst-case guarantees for reinforcement learning and learning in dynamical systems, and algorithms which provably adapt to the difficulty of, and perform optimally on, each particular problem instance.