Resumen
Although broad reinforcement learning (BRL) provides a more intelligent autonomous decision-making method for the collision avoidance problem of unmanned surface vehicles (USVs), the algorithm still has the problem of over-estimation and has difficulty converging quickly due to the sparse reward problem in a large area of sea. To overcome the dilemma, we propose a double broad reinforcement learning based on hindsight experience replay (DBRL-HER) for the collision avoidance system of USVs to improve the efficiency and accuracy of decision-making. The algorithm decouples the two steps of target action selection and target Q value calculation to form the double broad reinforcement learning method and then adopts hindsight experience replay to allow the agent to learn from the experience of failure in order to greatly improve the sample utilization efficiency. Through training in a grid environment, the collision avoidance success rate of the proposed algorithm was found to be 31.9 percentage points higher than that in the deep Q network (DQN) and 24.4 percentage points higher than that in BRL. A Unity 3D simulation platform with high fidelity was also designed to simulate the movement of USVs. An experiment on the platform fully verified the effectiveness of the proposed algorithm.