Problems in Pure Exploration Multi-Armed Bandits with Multi-Dimensional Feedback and Criteria
Add to Google Calendar
Many applications can be modeled as follows: an agent is given access to several distributions and she wishes to determine those that meet some pre-specified criteria by sampling from the distributions in a sequential experiment. For example in crowdsourcing, it is important to identify high-quality workers from a large pool of prospective workers, e.g., those who give the correct answer with the highest probability on a random question. Pure exploration multi-armed bandits provide a framework for designing statistically efficient adaptive algorithms for solving these problems. Most of the literature has focused on settings where at each round of the sequential experiment the feedback is scalar-valued (i.e., one random number is observed). In this thesis, I argue that many applications exhibit multi-dimensional feedback and criteria. For example, in crowdsourcing, it is natural to require that a worker give the correct answer with high probability and at a suitable pace (e.g., within 15 seconds) and there is feedback both on whether a worker is correct and on her required response time. To study and solve applications of this multi-dimensional nature, I introduce novel pure exploration multi-armed bandit problems and design new algorithms that enjoy both strong theoretical guarantees and excellent empirical performance.