Reddy, Gautam; Celani, Antonio; Vergassola, Massimo

doi:10.1007/s10955-016-1521-0

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Infomax Strategies for an Optimal Balance Between Exploration and Exploitation.
Authors: Reddy, Gautam; Celani, Antonio; Vergassola, Massimo
Abstract: Proper balance between exploitation and exploration is what makes good decisions that achieve high reward, like payoff or evolutionary fitness. The Infomax principle postulates that maximization of information directs the function of diverse systems, from living systems to artificial neural networks. While specific applications turn out to be successful, the validity of information as a proxy for reward remains unclear. Here, we consider the multi-armed bandit decision problem, which features arms (slot-machines) of unknown probabilities of success and a player trying to maximize cumulative payoff by choosing the sequence of arms to play. We show that an Infomax strategy (Info-p) which optimally gathers information on the highest probability of success among the arms, saturates known optimal bounds and compares favorably to existing policies. Conversely, gathering information on the identity of the best arm in the bandit leads to a strategy that is vastly suboptimal in terms of payoff. The nature of the quantity selected for Infomax acquisition is then crucial for effective tradeoffs between exploration and exploitation.
Subjects: LARGE deviations (Mathematics); ARTIFICIAL neural networks; MULTI-armed bandit problem (Probability theory); DECISION theory; INFORMATION theory
Publication: Journal of Statistical Physics, 2016, Vol 163, Issue 6, p1454
ISSN: 0022-4715
Publication type: Article
DOI: 10.1007/s10955-016-1521-0

We found a match

Infomax Strategies for an Optimal Balance Between Exploration and Exploitation.

Reddy, Gautam; Celani, Antonio; Vergassola, Massimo

LARGE deviations (Mathematics); ARTIFICIAL neural networks; MULTI-armed bandit problem (Probability theory); DECISION theory; INFORMATION theory

Journal of Statistical Physics, 2016, Vol 163, Issue 6, p1454

0022-4715

Article

10.1007/s10955-016-1521-0