We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Multi-Gear Bandits, Partial Conservation Laws, and Indexability †.
- Authors
Niño-Mora, José
- Abstract
This paper considers what we propose to call multi-gear bandits, which are Markov decision processes modeling a generic dynamic and stochastic project fueled by a single resource and which admit multiple actions representing gears of operation naturally ordered by their increasing resource consumption. The optimal operation of a multi-gear bandit aims to strike a balance between project performance costs or rewards and resource usage costs, which depend on the resource price. A computationally convenient and intuitive optimal solution is available when such a model is indexable, meaning that its optimal policies are characterized by a dynamic allocation index (DAI), a function of state–action pairs representing critical resource prices. Motivated by the lack of general indexability conditions and efficient index-computing schemes, and focusing on the infinite-horizon finite-state and -action discounted case, we present a verification theorem ensuring that, if a model satisfies two proposed PCL-indexability conditions with respect to a postulated family of structured policies, then it is indexable and such policies are optimal, with its DAI being given by a marginal productivity index computed by a downshift adaptive-greedy algorithm in A N steps, with A + 1 actions and N states. The DAI is further used as the basis of a new index policy for the multi-armed multi-gear bandit problem.
- Subjects
GREEDY algorithms; ROBBERS; MARGINAL productivity; MARKOV processes; CONSUMPTION (Economics); FAMILY policy
- Publication
Mathematics (2227-7390), 2022, Vol 10, Issue 14, pN.PAG
- ISSN
2227-7390
- Publication type
Article
- DOI
10.3390/math10142497