Agglomerative Clustering and Residual-VLAD Encoding for Human Action Recognition

Ammar Mohsin Butt

Muhammad Haroon Yousaf

Fiza Murtaza

Saima Nazir

Serestina Viriri and Sergio A. Velastin

Resumen

Human action recognition has gathered significant attention in recent years due to its high demand in various application domains. In this work, we propose a novel codebook generation and hybrid encoding scheme for classification of action videos. The proposed scheme develops a discriminative codebook and a hybrid feature vector by encoding the features extracted from CNNs (convolutional neural networks). We explore different CNN architectures for extracting spatio-temporal features. We employ an agglomerative clustering approach for codebook generation, which intends to combine the advantages of global and class-specific codebooks. We propose a Residual Vector of Locally Aggregated Descriptors (R-VLAD) and fuse it with locality-based coding to form a hybrid feature vector. It provides a compact representation along with high order statistics. We evaluated our work on two publicly available standard benchmark datasets HMDB-51 and UCF-101. The proposed method achieves 72.6% and 96.2% on HMDB51 and UCF101, respectively. We conclude that the proposed scheme is able to boost recognition accuracy for human action recognition.

Palabras claves

action recognition - bag-of-words - deep residual networks - clustering - feature encoding - classification

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 10 Parte: 12 (2020)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

DOI

https://doi.org/10.3390/app10124412

Agglomerative Clustering and Residual-VLAD Encoding for Human Action Recognition

Revistas destacadas