Yang, Kaibiao; Dong, Wenhan; Cai, Ming; Jia, Shengde; Liu, Ri

doi:10.3390/electronics11162602

Back to matches

Your institution may have access to this item. Find your institution then sign in to continue.

Title: UCAV Air Combat Maneuver Decisions Based on a Proximal Policy Optimization Algorithm with Situation Reward Shaping.
Authors: Yang, Kaibiao; Dong, Wenhan; Cai, Ming; Jia, Shengde; Liu, Ri
Abstract: Autonomous maneuver decision by an unmanned combat air vehicle (UCAV) is a critical part of air combat that requires both flight safety and tactical maneuvering. In this paper, an unmanned combat air vehicle air combat maneuver decision method based on a proximal policy optimization algorithm (PPO) is proposed. Firstly, a motion model of the unmanned combat air vehicle and a situation assessment model of air combat was established to describe the motion situation of the unmanned combat air vehicle. An enemy maneuver policy based on a situation assessment with a greedy algorithm was also proposed for air combat confrontation, which aimed to verify the performance of the proximal policy optimization algorithm. Then, an action space based on a basic maneuver library and a state observation space of the proximal policy optimization algorithm were constructed, and a reward function with situation reward shaping was designed for accelerating the convergence rate. Finally, a simulation of air combat confrontation was carried out, which showed that the agent using the proximal policy optimization algorithm learned to combine a series of basic maneuvers, such as diving, climb and circling, into tactical maneuvers and eventually defeated the enemy. The winning rate of the proximal policy optimization algorithm reached 62%, and the corresponding losing rate was only 11%.
Subjects: MATHEMATICAL optimization; REINFORCEMENT learning; GREEDY algorithms; REWARD (Psychology); ARMORED military vehicles
Publication: Electronics (2079-9292), 2022, Vol 11, Issue 16, p2602
ISSN: 2079-9292
Publication type: Article
DOI: 10.3390/electronics11162602

We found a match

UCAV Air Combat Maneuver Decisions Based on a Proximal Policy Optimization Algorithm with Situation Reward Shaping.

Yang, Kaibiao; Dong, Wenhan; Cai, Ming; Jia, Shengde; Liu, Ri

MATHEMATICAL optimization; REINFORCEMENT learning; GREEDY algorithms; REWARD (Psychology); ARMORED military vehicles

Electronics (2079-9292), 2022, Vol 11, Issue 16, p2602

2079-9292

Article

10.3390/electronics11162602