Xiaoteng Ma; Shuai Ma; Li Xia; Qianchuan Zhao

doi:10.1613/jair.1.13833

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning.
Authors: Xiaoteng Ma; Shuai Ma; Li Xia; Qianchuan Zhao
Abstract: Keeping risk under control is often more crucial than maximizing expected reward in real-world decision-making situations, such as finance, robotics, autonomous driving, etc. The most natural choice of risk measures is variance, while it penalizes the upside volatility as much as the downside part. Instead, the (downside) semivariance, which captures the negative deviation of a random variable under its mean, is more suitable for risk-averse proposes. This paper aims at optimizing the mean-semivariance (MSV) criterion in reinforcement learning w.r.t. steady rewards. Since semivariance is time-inconsistent and does not satisfy the standard Bellman equation, the traditional dynamic programming methods are inapplicable to MSV problems directly. To tackle this challenge, we resort to the Perturbation Analysis (PA) theory and establish the performance difference formula for MSV. We reveal that the MSV problem can be solved by iteratively solving a sequence of RL problems with a policy-dependent reward function. Further, we propose two on-policy algorithms based on the policy gradient theory and the trust region method. Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in MuJoCo, which demonstrate the effectiveness of our proposed methods.
Subjects: DECISION making; ALGORITHMS; QUANTUM perturbations; REINFORCEMENT learning; MACHINE learning
Publication: Journal of Artificial Intelligence Research, 2022, Vol 75, p569
ISSN: 1076-9757
Publication type: Article
DOI: 10.1613/jair.1.13833

We found a match

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning.

Xiaoteng Ma; Shuai Ma; Li Xia; Qianchuan Zhao

DECISION making; ALGORITHMS; QUANTUM perturbations; REINFORCEMENT learning; MACHINE learning

Journal of Artificial Intelligence Research, 2022, Vol 75, p569

1076-9757

Article

10.1613/jair.1.13833