Resumen
Windows, as key components of building facades, have received increasing attention in facade parsing. Convolutional neural networks have shown promising results in window extraction. Most existing methods segment a facade into semantic categories and subsequently employ regularization based on the structure of manmade architectures. These methods merely concern the optimization of individual windows, without considering the spatial areas or relationships of windows. This paper presents a novel windows instance segmentation method based on Mask R-CNN architecture. The method features a spatial attention region proposal network and a relation module-enhanced head network. First, an attention module is introduced in the region proposal network to generate a spatial attention map, then the attention map is multiplied with the objectness scores of the classification branch. Second, for the head network, relation modules are added to model the spatial relationships between proposals. Appearance and geometric features are combined for instance recognition. Furthermore, we constructed a new window instance segmentation dataset with 1200 annotated images. With our dataset, the average precisions of our method on detection and segmentation increased from 53.1% and 53.7% to 56.4% and 56.7% compared with Mask R-CNN. A comparison with state-of-the-art methods also proves the predominance of our proposed method.