Resumen
Smartphones, smartwatches, fitness trackers, and ad-hoc wearable devices are being increasingly used to monitor human activities. Data acquired by the hosted sensors are usually processed by machine-learning-based algorithms to classify human activities. The success of those algorithms mostly depends on the availability of training (labeled) data that, if made publicly available, would allow researchers to make objective comparisons between techniques. Nowadays, there are only a few publicly available data sets, which often contain samples from subjects with too similar characteristics, and very often lack specific information so that is not possible to select subsets of samples according to specific criteria. In this article, we present a new dataset of acceleration samples acquired with an Android smartphone designed for human activity recognition and fall detection. The dataset includes 11,771 samples of both human activities and falls performed by 30 subjects of ages ranging from 18 to 60 years. Samples are divided in 17 fine grained classes grouped in two coarse grained classes: one containing samples of 9 types of activities of daily living (ADL) and the other containing samples of 8 types of falls. The dataset has been stored to include all the information useful to select samples according to different criteria, such as the type of ADL performed, the age, the gender, and so on. Finally, the dataset has been benchmarked with four different classifiers and with two different feature vectors. We evaluated four different classification tasks: fall vs. no fall, 9 activities, 8 falls, 17 activities and falls. For each classification task, we performed a 5-fold cross-validation (i.e., including samples from all the subjects in both the training and the test dataset) and a leave-one-subject-out cross-validation (i.e., the test data include the samples of a subject only, and the training data, the samples of all the other subjects). Regarding the classification tasks, the major findings can be summarized as follows: (i) it is quite easy to distinguish between falls and ADLs, regardless of the classifier and the feature vector selected. Indeed, these classes of activities present quite different acceleration shapes that simplify the recognition task; (ii) on average, it is more difficult to distinguish between types of falls than between types of activities, regardless of the classifier and the feature vector selected. This is due to the similarity between the acceleration shapes of different kinds of falls. On the contrary, ADLs acceleration shapes present differences except for a small group. Finally, the evaluation shows that the presence of samples of the same subject both in the training and in the test datasets, increases the performance of the classifiers regardless of the feature vector used. This happens because each human subject differs from other subjects in performing activities even if she shares with them the same physical characteristics.