Resumen
Ducted-fan tail-sitter unmanned aerial vehicles (UAVs) provide versatility and unique benefits, attracting significant attention in various applications. This study focuses on developing a safe reinforcement learning method for back-transition control between level flight mode and hover mode for ducted-fan tail-sitter UAVs. Our method enables transition control with a minimal altitude change and transition time while adhering to the velocity constraint. We employ the Trust Region Policy Optimization, Proximal Policy Optimization with Lagrangian, and Constrained Policy Optimization (CPO) algorithms for controller training, showcasing the superiority of the CPO algorithm and the necessity of the velocity constraint. The transition trajectory achieved using the CPO algorithm closely resembles the optimal trajectory obtained via the well-known GPOPS-II software with the SNOPT solver. Meanwhile, the CPO algorithm also exhibits strong robustness under unknown perturbations of UAV model parameters and wind disturbance.