Other Seminar

Leveraging Offline Data in Latent Decision Processes

Chinmaya KausikPhD Student, MathematicsUniversity of Michigan
WHERE:
1303 EECS BuildingMap
SHARE:

Abstract: In many sequential decision-making applications like recommender systems, training conversational agents, healthcare and education, we have a wealth of offline data of short trajectories that can be used to make offline decisions or accelerate online learning. However, it is crucial to account for unseen differences between users or environments in the offline data. Mixture/latent models of MDPs and bandits form a compelling model for this setting, and principled algorithms with end-to-end guarantees are needed. We discuss why it is challenging to work with unlabelled offline data, and present algorithms for both making conclusions offline and accelerating online learning. We discuss end-to-end guarantees and experiments for both sets of algorithms. This talk is based on two papers by the author, Learning Mixtures of Markov Chains and MDPs (Oral, ICML 2023) and Leveraging Offline Data in Linear Latent Bandits (Under Review, 2025).

Bio: Chinmaya Kausik is a 4th year PhD student in mathematics at the University of Michigan. His research interests are in bridging theory and practice in reinforcement learning, bandits and sequential decision-making. His past work has focused on latent information/partial observability, offline data, denoising and preferential feedback. Previously, he was an undergraduate student in mathematics at the Indian Institute of Science. His research has been recognised by the Rackham International Student Fellowship.