Resumen
Accurate extraction of urban landscape features in the historic district of China is an essential task for the protection of the cultural and historical heritage. In recent years, deep learning (DL)-based methods have made substantial progress in landscape feature extraction. However, the lack of annotated data and the complex scenarios inside alleyways result in the limited performance of the available DL-based methods when extracting landscape features. To deal with this problem, we built a small yet comprehensive history-core street view (HCSV) dataset and propose a polarized attention-based landscape feature segmentation network (PALESNet) in this article. The polarized self-attention block is employed in PALESNet to discriminate each landscape feature in various situations, whereas the atrous spatial pyramid pooling (ASPP) block is utilized to capture the multi-scale features. As an auxiliary, a transfer learning module was introduced to supplement the knowledge of the network, to overcome the shortage of labeled data and improve its learning capability in the historic districts. Compared to other state-of-the-art methods, our network achieved the highest accuracy in the case study of Beijing Core Area, with an mIoU of 63.7% on the HCSV dataset; and thus could provide sufficient and accurate data for further protection and renewal in Chinese historic districts.