Resumen
Person re-identification (Re-ID) is a key technology used in the field of intelligent surveillance. The existing Re-ID methods are mainly realized by using convolutional neural networks (CNNs), but the feature information is easily lost in the operation process due to the down-sampling structure design in CNNs. Moreover, CNNs can only process one local neighbourhood at a time, which makes the global perception of the network poor. To overcome these shortcomings, in this study, we apply a pure transformer to a video-based Re-ID task by proposing an adaptive partitioning and multi-granularity (APMG) network framework. To enable the pure transformer structure better at adapting to the Re-ID task, we propose a new correlation-adaptive partitioning (CAP) of feature embedding modules that can adaptively partition person images according to structural correlations and thus retain the structure and semantics of local feature information in the images. To improve the Re-ID performance of the network, we also propose a multi-granularity (MG) module to better capture people feature information at different levels of granularity. We performed validation trials on three video-based benchmark datasets. The results show that the network structure based on the pure transformer can adapt to Re-ID tasks well, and our APMG network outperforms other state-of-the-art methods.