Dissertation Defense

On the Importance of Inherent Structural Properties for Learning in Markov Decision Processes

Saghar Adler
1340 EECS (LNF Conference Room)Map
Saghar Adler Defense Photo

PASSCODE: 253356


Recently, reinforcement learning (RL) methodologies have been applied to solve sequential decision-making problems in various fields, such as autonomous control, communication, and resource allocation. Despite practical success, there has been less progress in developing theoretical performance guarantees. By studying two different settings, we aim to address the limitations of current theoretical frameworks by using the inherent structural properties of Markov decision processes (MDPs). In the first setting, for admission control in systems modeled by the Erlang-B blocking model with unknown arrival and service rates, we use model knowledge to compensate for the lack of reward signals. We propose a learning algorithm based on self-tuning adaptive control, prove our algorithm is asymptotically optimal, and provide finite-time guarantees.
The second setting develops a framework for applying RL methods to MDPs with countably infinite state spaces and unbounded cost functions. An existing Thompson sampling-based learning algorithm is extended to countable state space using the ergodicity properties of certain MDPs. We establish asymptotic optimality of our policy by proving a sub-linear (in time-horizon) regret. Finally, to demonstrate the applicability of our algorithm to queueing models of communication networks and computing systems, we apply our proposed algorithm to the control problem of two different queueing systems with unknown dynamics.


CHAIR: Professor Vijay Subramanian