Resumen
Water surface garbage has a significant impact on the protection of water environments and ecological balance, making water surface garbage object detection a critical task. Traditional supervised object detection methods require a large amount of annotated data. To address this issue, we propose a method that combines strong and weak supervision with CLIP (Contrastive Language?Image Pretraining) for water surface garbage object detection. First, we train on a dataset annotated with strong supervision, using traditional object detection algorithms to learn the location information of water surface garbage. Then, we input the water surface garbage images into CLIP?s visual encoder to obtain visual feature representations. Simultaneously, we train CLIP?s text encoder using textual description annotations to obtain textual feature representations of the images. By fusing the visual and textual features, we obtain comprehensive feature representations. In the weak supervision training phase, we input the comprehensive feature representations into the object detection model and employ a training strategy that combines strong and weak supervision to detect and localize water surface garbage. To further improve the model?s performance, we introduce attention mechanisms and data augmentation techniques to enhance the model?s focus and robustness towards water surface garbage. By conducting experiments on two water surface garbage datasets, we validate the effectiveness of the proposed method based on the combination of strong and weak supervision with CLIP for water surface garbage object detection tasks.