Resumen
Road crack detection is one of the important issues in the field of traffic safety and urban planning. Currently, road damage varies in type and scale, and often has different sizes and depths, making the detection task more challenging. To address this problem, we propose a Cross-Attention-guided Feature Alignment Network (CAFANet) for extracting and integrating multi-scale features of road damage. Firstly, we use a dual-branch visual encoder model with the same structure but different patch sizes (one large patch and one small patch) to extract multi-level damage features. We utilize a Cross-Layer Interaction (CLI) module to establish interaction between the corresponding layers of the two branches, combining their unique feature extraction capability and contextual understanding. Secondly, we employ a Feature Alignment Block (FAB) to align the features from different levels or branches in terms of semantics and spatial aspects, which significantly improves the CAFANet?s perception of the damage regions, reduces background interference, and achieves more precise detection and segmentation of damage. Finally, we adopt multi-layer convolutional segmentation heads to obtain high-resolution feature maps. To validate the effectiveness of our approach, we conduct experiments on the public CRACK500 dataset and compare it with other mainstream methods. Experimental results demonstrate that CAFANet achieves excellent performance in road crack detection tasks, which exhibits significant improvements in terms of F1 score and accuracy, with an F1 score of 73.22% and an accuracy of 96.78%.