Resumen
This paper proposes a novel approach to activity recognition where videos are compressed using video coding to generate feature vectors based on compression variables. We propose to eliminate the temporal domain of feature vectors by computing the mean and standard deviation of each variable across all video frames. Thus, each video is represented by a single feature vector of 67 variables. As for the motion vectors, we eliminated their temporal domain by projecting their phases using PCA, thus representing each video by a single feature vector with a length equal to the number of frames in a video. Consequently, complex classifiers such as LSTM can be avoided and classical machine learning techniques can be used instead. Experimental results on the JHMDB dataset resulted in average classification accuracies of 68.8% and 74.2% when using the projected phases of motion vectors and video coding feature variables, respectively. The advantage of the proposed solution is the use of FVs with low dimensionality and simple machine learning techniques.