Resumen
Person re-identification aims to identify the same pedestrians captured by various cameras from different viewpoints in multiple scenarios. Occlusion is the toughest problem for practical applications. In video-based ReID tasks, motion information can be easily obtained from sampled frames, and provide discriminative human part representations. However, most motion-based methodologies are designed for video frames which are not suitable for processing single static image input. In this paper, we propose a Motion-Aware Fusion (MAF) network, aiming to acquire motion information from static images in order to improve the performance of ReID tasks. Specifically, a visual adapter is introduced to enable visual feature extraction, either from image or video data. We design a motion consistency task to guide the motion-aware transformer to learn representative human-part motion information and greatly improve the learning quality of features of occluded pedestrians. Extensive experiments on popular holistic, occluded, and video datasets demonstrate the effectiveness of our proposed method. This method outperforms state-of-the-art approaches by improving the mean average precision (mAP) by 1.5% and rank-1 accuracy by 1.2% on the challenging Occluded-REID dataset. At the same time, it surpasses other methods on the MARS dataset with an improvement of 0.2% in mAP and 0.1% in rank-1 accuracy.