Resumen
Deep learning-based object detection is one of the most popular research topics. However, in cases where large-scale datasets are unavailable, the training of detection models remains challenging due to the data-driven characteristics of deep learning. Small object detection in infrared images is such a case. To solve this problem, we propose a YOLOv5-based framework with a novel training strategy based on the domain adaptation method. First, an auxiliary domain classifier is combined with the YOLOv5 architecture to compose a detection framework that is trainable using datasets from multiple domains while maintaining calculation costs in the inference stage. Secondly, a new loss function based on Wasserstein distance is proposed to deal with small-sized objects by overcoming the problem of the intersection over union sensitivity problem in small-scale cases. Then, a model training strategy inspired from domain adaptation and knowledge distillation is presented. Using the domain confidence output of the domain classifier as a soft label, domain confusion loss is backpropagated to force the model to extract domain-invariant features while training the model with datasets with imbalanced distributions. Additionally, we generate a synthetic dataset in both the visible light and infrared spectrum to overcome the data shortage. The proposed framework is trained on the MS COCO, VEDAI, DOTA, ADAS Thermal datasets along with a constructed synthetic dataset for human detection and vehicle detection tasks. The experimental results show that the proposed framework achieved the best mean average precision (mAP) of 64.7 and 57.5 in human and vehicle detection tasks. Additionally, the ablation experiment shows that the proposed training strategy can improve the performance by training the model to extract domain-invariant features.