We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Infomax Strategies for an Optimal Balance Between Exploration and Exploitation.
- Authors
Reddy, Gautam; Celani, Antonio; Vergassola, Massimo
- Abstract
Proper balance between exploitation and exploration is what makes good decisions that achieve high reward, like payoff or evolutionary fitness. The Infomax principle postulates that maximization of information directs the function of diverse systems, from living systems to artificial neural networks. While specific applications turn out to be successful, the validity of information as a proxy for reward remains unclear. Here, we consider the multi-armed bandit decision problem, which features arms (slot-machines) of unknown probabilities of success and a player trying to maximize cumulative payoff by choosing the sequence of arms to play. We show that an Infomax strategy (Info-p) which optimally gathers information on the highest probability of success among the arms, saturates known optimal bounds and compares favorably to existing policies. Conversely, gathering information on the identity of the best arm in the bandit leads to a strategy that is vastly suboptimal in terms of payoff. The nature of the quantity selected for Infomax acquisition is then crucial for effective tradeoffs between exploration and exploitation.
- Subjects
LARGE deviations (Mathematics); ARTIFICIAL neural networks; MULTI-armed bandit problem (Probability theory); DECISION theory; INFORMATION theory
- Publication
Journal of Statistical Physics, 2016, Vol 163, Issue 6, p1454
- ISSN
0022-4715
- Publication type
Article
- DOI
10.1007/s10955-016-1521-0