Resumen
The extraction of community-scale green infrastructure (CSGI) poses challenges due to limited training data and the diverse scales of the targets. In this paper, we reannotate a training dataset of CSGI and propose a three-stage transfer learning method employing a novel hybrid architecture, MPViT-DeepLab, to help us focus on CSGI extraction and improve its accuracy. In MPViT-DeepLab, a Multi-path Vision Transformer (MPViT) serves as the feature extractor, feeding both coarse and fine features into the decoder and encoder of DeepLabv3+, respectively, which enables pixel-level segmentation of CSGI in remote sensing images. Our method achieves state-of-the-art results on the reannotated dataset.