Haskell, William B.; Jain, Rahul; Kalathil, Dileep

doi:10.1287/moor.2015.0733

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Empirical Dynamic Programming.
Authors: Haskell, William B.; Jain, Rahul; Kalathil, Dileep
Abstract: We propose empirical dynamic programming algorithms for Markov decision processes. In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get "empirical value iteration" (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get "empirical policy iteration" (EPI). Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. We introduce notions of probabilistic fixed points for such random monotone operators. We develop a stochastic dominance framework for convergence analysis of such operators. We then use this to give sample complexity bounds for both EVI and EPI. We then provide various variations and extensions to asynchronous empirical dynamic programming, the minimax empirical dynamic program, and show how this can also be used to solve the dynamic newsvendor problem. Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms.
Subjects: DYNAMIC programming; EMPIRICAL research; SIMULATION methods &; models; RANDOM operators; ACCELERATION of convergence in numerical analysis; STOCHASTIC approximation
Publication: Mathematics of Operations Research, 2016, Vol 41, Issue 2, p402
ISSN: 0364-765X
Publication type: Article
DOI: 10.1287/moor.2015.0733

We found a match

Empirical Dynamic Programming.

Haskell, William B.; Jain, Rahul; Kalathil, Dileep

DYNAMIC programming; EMPIRICAL research; SIMULATION methods &; models; RANDOM operators; ACCELERATION of convergence in numerical analysis; STOCHASTIC approximation

Mathematics of Operations Research, 2016, Vol 41, Issue 2, p402

0364-765X

Article

10.1287/moor.2015.0733