Resumen
Autonomous urban driving navigation is still an open problem and has ample room for improvement in unknown complex environments. This paper proposes an end-to-end autonomous driving approach that combines Conditional Imitation Learning (CIL), Mask R-CNN with DDPG. In the first stage, data acquisition is first performed by using CARLA, a high-fidelity simulation software. Data collected by CARLA is used to train the Mask R-CNN network, which is used for object detection and segmentation. The segmented images are transformed into the backbone of CIL to perform supervised Imitation Learning (IL). DDPG means using Reinforcement Learning for further training in the second stage, which shares the learned weights from the pre-trained CIL model. The combination of the two methods is an innovative way of considering. The benefit is that it is possible to speed up training considerably and obtain super-high levels of performance beyond humans. We conduct experiments on the CARLA driving benchmark of urban driving. In the final experiments, our algorithm outperforms the original MP by 30%, CIL by 33%, and CIRL by 10% in the most difficult tasks, dynamic navigation tasks, and in new environments and new weather, demonstrating that the two-stage framework proposed in this paper shows remarkable generalization capability in unknown environments on navigation tasks.