Resumen
A growing focus among scientists has been on researching the techniques of automatic detection of dementia that can be applied to the speech samples of individuals with dementia. Leveraging the rapid advancements in Deep Learning (DL) and Natural Language Processing (NLP), these techniques have shown great potential in dementia detection. In this context, this paper proposes a method for dementia detection from the transcribed speech of subjects. Unlike conventional methods that rely on advanced language models to address the ability of the subject to make coherent and meaningful sentences, our approach relies on the center of focus of the subjects and how it changes over time as the subject describes the content of the cookie theft image, a commonly used image for evaluating one?s cognitive abilities. To do so, we divide the cookie theft image into regions of interest, and identify, in each sentence spoken by the subject, which regions are being talked about. We employed a Long Short-Term Memory (LSTM) neural network to learn different patterns of dementia subjects and control ones and used it to perform a 10-fold cross validation-based classification. Our experimental results on the Pitt corpus from the DementiaBank resulted in a 82.9% accuracy at the subject level and 81.0% at the sample level. By employing data-augmentation techniques, the accuracy at both levels was increased to 83.6% and 82.1%, respectively. The performance of our proposed method outperforms most of the conventional methods, which reach, at best, an accuracy equal to 81.5% at the subject level.