刘剑锋; 普杰信; 孙力帆

doi:10.3778/j.issn.1002-8331.2110-0205

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: 融合对比预测编码的深度双Q网络.
Authors: 刘剑锋; 普杰信; 孙力帆
Abstract: In the model unknown partially observable Markov decision process (POMDP), the agent cannot directly access the true state of environment, and the perceptual uncertainty poses challenges for learning the optimal policy. Thus, a double deep Q-network reinforcement learning algorithm based on the representation of the contrastive predictive coding is proposed. The belief states are modeled explicitly to obtain a compact and efficient history encoding for the policy optimization. To improve data efficiency, the belief replay buffer is introduced to reduce the memory usage by directly storing the belief transition pairs instead of the observation and action sequences. In addition, the phased training strategy is designed for decoupling the representation learning from the policy learning process to improve training stability. The POMDP navigation tasks based on the Gym-MiniGrid environment are designed. Experimental results show that the semantic information related to the state can be captured by the proposed algorithm, which facilitates to achieve stable and efficient policy learning in POMDP.
Subjects: PARTIALLY observable Markov decision processes; REINFORCEMENT learning; MACHINE learning
Publication: Journal of Computer Engineering & Applications, 2023, Vol 59, Issue 6, p162
ISSN: 1002-8331
Publication type: Article
DOI: 10.3778/j.issn.1002-8331.2110-0205

We found a match

融合对比预测编码的深度双Q网络.

刘剑锋; 普杰信; 孙力帆

PARTIALLY observable Markov decision processes; REINFORCEMENT learning; MACHINE learning

Journal of Computer Engineering & Applications, 2023, Vol 59, Issue 6, p162

1002-8331

Article

10.3778/j.issn.1002-8331.2110-0205