Resumen
Stunting is a condition in which children experience impaired growth and development, caused by malnutrition, repeated infections, and inadequate psychosocial stimulation. It often remains unrecognized due to a lack of awareness in the community. Therefore, the first step towards developing a solution for stunting is to understand the level of awareness and the sentiment of the community towards issues related to stunting. As online media are widely used in everyday life, they offer significant potential towards providing such an understanding. However, exploiting this potential requires extensive identification of documents containing discussions of stunting among lay people, to accurately gauge the awareness and sentiments of the community towards stunting. This task is a multi-class classification problem. We perform a benchmark study, using data from the Indonesian context, to comparatively evaluate the performances of four algorithms, i.e., logistic regression, naive Bayes, random forest, and support vector machine (SVM), and three extracted features, namely term occurrence, term presence, and term frequency-inverse document frequency (TF-IDF). The SVM method coupled with TF-IDF produced the highest accuracy value of 0.98, with a standard deviation of 0.03, due to its capability to automatically model the interaction between features.