We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Minimax weight learning for absorbing MDPs.
- Authors
Li, Fengying; Li, Yuqiang; Wu, Xianyi
- Abstract
Reinforcement learning policy evaluation problems are often modeled as finite or discounted/averaged infinite-horizon Markov Decision Processes (MDPs). In this paper, we study undiscounted off-policy evaluation for absorbing MDPs. Given the dataset consisting of i.i.d episodes under a given truncation level, we propose an algorithm (referred to as MWLA in the text) to directly estimate the expected return via the importance ratio of the state-action occupancy measure. The Mean Square Error (MSE) bound of the MWLA method is provided and the dependence of statistical errors on the data size and the truncation level are analyzed. The performance of the algorithm is illustrated by means of computational experiments under an episodic taxi environment
- Subjects
STATISTICAL errors; STATISTICS; EXPECTED returns; MARKOV processes; REINFORCEMENT learning
- Publication
Statistical Papers, 2024, Vol 65, Issue 6, p3545
- ISSN
0932-5026
- Publication type
Article
- DOI
10.1007/s00362-023-01491-4