Resumen
Since an individual approach can hardly navigate robots through complex environments, we present a novel two-level hierarchical framework called JPS-IA3C (Jump Point Search improved Asynchronous Advantage Actor-Critic) in this paper for robot navigation in dynamic environments through continuous controlling signals. Its global planner JPS+ (P) is a variant of JPS (Jump Point Search), which efficiently computes an abstract path of neighboring jump points. These nodes, which are seen as subgoals, completely rid Deep Reinforcement Learning (DRL)-based controllers of notorious local minima. To satisfy the kinetic constraints and be adaptive to changing environments, we propose an improved A3C (IA3C) algorithm to learn the control policies of the robots? local motion. Moreover, the combination of modified curriculum learning and reward shaping helps IA3C build a novel reward function framework to avoid learning inefficiency because of sparse reward. We additionally strengthen the robots? temporal reasoning of the environments by a memory-based network. These improvements make the IA3C controller converge faster and become more adaptive to incomplete, noisy information caused by partial observability. Simulated experiments show that compared with existing methods, this JPS-IA3C hierarchy successfully outputs continuous commands to accomplish large-range navigation tasks at shorter paths and less time through reasonable subgoal selection and rational motions.