Resumen
Mapping slums is vital for monitoring the Sustainable Development Goal (SDG) indicators. In the absence of reliable data, Remote Sensing (RS)-based approaches, particularly the Deep Learning (DL) methods, have gained recognition and high accuracies for slum mapping. However, using RS alone has its limitation in complex urban environments. Previous studies showed the added value of combining ground-level information with RS. Therefore, this research aims to integrate Remote Sensing Imagery (RSI) and Street View Images (SVI) for slum mapping. Jakarta city is the study area representing the challenge of distinguishing between slum and non-slum kampungs, and these kampungs accommodate approximately 60% of the population of Jakarta. This research compares the mapping results obtained by four DL networks: FCN-DK6 used only RSI, a VGG16 used only SVI, and two networks combined RSI and SVI (FCN-DK6-i and Modified FCN-DK6). Further, the Modified FCN-DK6 network was explored by integrating SVI at each convolutional layer, i.e., Modified FCN-DK6_1, Modified FCN-DK6_2, Modified FCN-DK6_3, Modified FCN-DK6_4, and Modified FCN-DK6_5. Experimental results demonstrate that combining RSI and SVI improves the accuracy, depending on how and at what level in the FCN network they are integrated. The Modified FCN-DK6_2 outperforms the rest in Modified FCN-DK6 experiments and FCN-DK6-i.