We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
融合对比预测编码的深度双Q网络.
- Authors
刘剑锋; 普杰信; 孙力帆
- Abstract
In the model unknown partially observable Markov decision process (POMDP), the agent cannot directly access the true state of environment, and the perceptual uncertainty poses challenges for learning the optimal policy. Thus, a double deep Q-network reinforcement learning algorithm based on the representation of the contrastive predictive coding is proposed. The belief states are modeled explicitly to obtain a compact and efficient history encoding for the policy optimization. To improve data efficiency, the belief replay buffer is introduced to reduce the memory usage by directly storing the belief transition pairs instead of the observation and action sequences. In addition, the phased training strategy is designed for decoupling the representation learning from the policy learning process to improve training stability. The POMDP navigation tasks based on the Gym-MiniGrid environment are designed. Experimental results show that the semantic information related to the state can be captured by the proposed algorithm, which facilitates to achieve stable and efficient policy learning in POMDP.
- Subjects
PARTIALLY observable Markov decision processes; REINFORCEMENT learning; MACHINE learning
- Publication
Journal of Computer Engineering & Applications, 2023, Vol 59, Issue 6, p162
- ISSN
1002-8331
- Publication type
Article
- DOI
10.3778/j.issn.1002-8331.2110-0205