Ghatrani, Zahra; Ghate, Archis

doi:10.1007/s10479-024-06165-4

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Percentile optimization in multi-armed bandit problems.
Authors: Ghatrani, Zahra; Ghate, Archis
Abstract: A multi-armed bandit (MAB) problem is described as follows. At each time-step, a decision-maker selects one arm from a finite set. A reward is earned from this arm and the state of that arm evolves stochastically. The goal is to determine an arm-pulling policy that maximizes expected total discounted reward over an infinite horizon. We study MAB problems where the rewards are multivariate Gaussian, to account for data-driven estimation errors. We employ a percentile optimization approach, wherein the goal is to find an arm-pulling policy that maximizes the sum of percentiles of expected total discounted rewards earned from individual arms. The idea is motivated by recent work on percentile optimization in Markov decision processes. We demonstrate that, when applied to MABs, this yields an intractable second-order cone program (SOCP) whose size is exponential in the number of arms. We use Lagrangian relaxation to break the resulting curse-of-dimensionality. Specifically, we show that the relaxed problem can be reformulated as an SOCP with size linear in the number of arms. We propose three approaches to recover feasible arm-pulling decisions during run-time from an off-line optimal solution of this SOCP. Our numerical experiments suggest that one of these three method appears to be more effective than the other two.
Subjects: MULTI-armed bandit problem (Probability theory); DYNAMIC programming; MARKOV processes; PERCENTILES
Publication: Annals of Operations Research, 2024, Vol 340, Issue 2/3, p837
ISSN: 0254-5330
Publication type: Article
DOI: 10.1007/s10479-024-06165-4

We found a match

Percentile optimization in multi-armed bandit problems.

Ghatrani, Zahra; Ghate, Archis

MULTI-armed bandit problem (Probability theory); DYNAMIC programming; MARKOV processes; PERCENTILES

Annals of Operations Research, 2024, Vol 340, Issue 2/3, p837

0254-5330

Article

10.1007/s10479-024-06165-4