Dissertation Defense

Provable and Efficient Algorithms for Safe Reinforcement Learning

Honghao Wei
3316 EECS BuildingMap



Safe reinforcement learning (RL) is an area of research focused on developing algorithms and methods that ensure the safety of RL agents during learning and decision-making processes. The goal is to enable RL agents to interact with their environments and learn optimal behavior while avoiding actions that may lead to harmful or undesired consequences.

The first part of this talk will explore designing a model-free, simulator-free algorithm for episodic CMDPs. The algorithm is named Triple-Q because it includes three key components: a Q-function for the cumulative reward, a Q-function for the cumulative utility of the constraint, and a virtual Queue that overestimates the cumulative constraint violation. The algorithm guarantees a sublinear regret and zero constraint violation when the total number of episodes is sufficiently large.

Then in the second part, I will discuss a more challenging setting, non-stationary CMDPs, where the rewards/utilities and dynamics are time-varying and probably unknown a priori. We propose the first model-free, simulator-free RL algorithms with sublinear regret and zero constraint violation for non-stationary CMDPs in both tabular and linear function approximation settings.  Our regret bound and constraint violation results for the tabular case match the corresponding best results for stationary CMDPs when the total budget is known. Additionally, we present a general framework for addressing the well-known challenges associated with analyzing non-stationary CMDPs, without requiring prior knowledge of the variation budget.


CHAIR: Professor Lei Ying