Resumen
Cloud and cloud shadow detection are essential in remote sensing imagery applications. Few semantic segmentation models were designed specifically for clouds and their shadows. Based on the visual and distribution characteristics of clouds and their shadows in remote sensing imagery, this paper provides a multi-supervised feature fusion attention network. We design a multi-scale feature fusion block (FFB) for the problems caused by the complex distribution and irregular boundaries of clouds and shadows. The block consists of a fusion convolution block (FCB), a channel attention block (CAB), and a spatial attention block (SPA). By multi-scale convolution, FCB reduces excessive semantic differences between shallow and deep feature maps. CAB focuses on global and local features through multi-scale channel attention. Meanwhile, it fuses deep and shallow feature maps with non-linear weighting to optimize fusion performance. SPA focuses on task-relevant areas through spatial attention. With the three blocks above, FCB alleviates the difficulties of fusing multi-scale features. Additionally, it makes the network resistant to background interference while optimizing boundary detection. Our proposed model designs a class feature attention block (CFAB) to increase the robustness of cloud detection. The network achieves good performance on the self-made cloud and shadow dataset. This dataset is taken from Google Earth and contains remote sensing imagery from several satellites. The proposed model achieved a mean intersection over union (MIoU) of 94.10% on our dataset, which is 0.44% higher than the other models. Moreover, it shows high generalization capability due to its superior prediction results on HRC_WHU and SPARCS datasets.