We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Percentile optimization in multi-armed bandit problems.
- Authors
Ghatrani, Zahra; Ghate, Archis
- Abstract
A multi-armed bandit (MAB) problem is described as follows. At each time-step, a decision-maker selects one arm from a finite set. A reward is earned from this arm and the state of that arm evolves stochastically. The goal is to determine an arm-pulling policy that maximizes expected total discounted reward over an infinite horizon. We study MAB problems where the rewards are multivariate Gaussian, to account for data-driven estimation errors. We employ a percentile optimization approach, wherein the goal is to find an arm-pulling policy that maximizes the sum of percentiles of expected total discounted rewards earned from individual arms. The idea is motivated by recent work on percentile optimization in Markov decision processes. We demonstrate that, when applied to MABs, this yields an intractable second-order cone program (SOCP) whose size is exponential in the number of arms. We use Lagrangian relaxation to break the resulting curse-of-dimensionality. Specifically, we show that the relaxed problem can be reformulated as an SOCP with size linear in the number of arms. We propose three approaches to recover feasible arm-pulling decisions during run-time from an off-line optimal solution of this SOCP. Our numerical experiments suggest that one of these three method appears to be more effective than the other two.
- Subjects
MULTI-armed bandit problem (Probability theory); DYNAMIC programming; MARKOV processes; PERCENTILES
- Publication
Annals of Operations Research, 2024, Vol 340, Issue 2/3, p837
- ISSN
0254-5330
- Publication type
Article
- DOI
10.1007/s10479-024-06165-4