Online Learning in Bandit Problems
Add to Google Calendar
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward which evolves stochastically over time. We consider bandit problems in an online framework which involves sequential decision-making under uncertainty. Within the context of this class of problems, agents who are initially unaware of the stochastic evolution of the arms, aim to maximize a common objective based on the history of actions and observations. Bandit problems have diverse applications including cognitive radio networks, network routing, internet advertising, web search, clinical trials and contract design.
Since the characteristics of agents for each one of these applications are different, our goal is to provide an agent-centric approach in designing online learning algorithms for bandit problems. For a single agent, we show that different performance levels ranging from optimality in weak regret to strong optimality are achievable for Markovian arms depending on the computational power of the agent. We also consider the novel area of multi-agent bandits which has informational decentralization and communication aspects not present in single-agent bandits, and develop distributed online learning algorithms that are optimal in terms of weak regret based on communication and computation constraints.