Communications and Signal Processing Seminar

Online Learning from Preference-Feedback

Aadirupa SahaPhD CandidateIndian Institute of Science
1005 EECS BuildingMap
Aadirupa Saha

We consider the problem of sequentially learning a set of good items from a pool of n alternatives, solely based on preference data–at each round, the learner adaptively chooses a k-subset of the n items (k >=2), and receives a noisy observations of their relative preferences. The setting occurs in several domains, eg. design of surveys, expert reviews, web search, recommender systems, ranking in multiplayer games etc. Note that classical online learning approaches, eg. Mulltiarmed Bandits (Auer et al༾), are however inadequate to express relative choices, as they typically model the absolute reward feedback of the selected items. The famously studied Dueling Bandits (Yue-Joachimsཅ) first attempted to address the problem with pairwise preferences (ie. k=2), but the more realistic setting of learning from any general k-subsetwise preference (k>=2) is mostly unexplored yet! We frame this problem as “Battling Bandits”, modeling the underlying subsetwise feedback with parametric choice models, and propose efficient algorithms for different objectives: Identifying the best item, top-set, full ranking etc., in both PAC and regret minimization setting, and also analyze the corresponding problem lower bounds to judge optimality of our methods. Finally we also extend the problem to a more practical and challenging setup of infinite decision spaces (n is infinite). With Aditya Gopalan (IISc, Bangalore), Branislav Kveton and Ofer Meshi (Google). Appeared in UAIཎ, ALTཏ, AIStatsཏ.



Aadirupa Saha is a PhD student at department of Computer Science and Automation (CSA), Indian Institute of Science (IISc), Bangalore and currently a research intern at Google, Mountain View (CA). Her research interests broadly lies in the areas of Machine Learning, Statistical Learning Theory and Optimization. Her current research specifically focuses on decision making under uncertainty from sequential data, and online rank aggregation problems.

Sponsored by



Judi Jones(734) 763-8557

Faculty Host

Vijay Subramanian