Benchmarking of Multi-Class Algorithms for Classifying Documents Related to Stunting

Retno Kusumaningrum

Titan A. Indihatmoko

Saesarinda R. Juwita

Alfi F. Hanifah

Khadijah Khadijah and Bayu Surarso

Resumen

Stunting is a condition in which children experience impaired growth and development, caused by malnutrition, repeated infections, and inadequate psychosocial stimulation. It often remains unrecognized due to a lack of awareness in the community. Therefore, the first step towards developing a solution for stunting is to understand the level of awareness and the sentiment of the community towards issues related to stunting. As online media are widely used in everyday life, they offer significant potential towards providing such an understanding. However, exploiting this potential requires extensive identification of documents containing discussions of stunting among lay people, to accurately gauge the awareness and sentiments of the community towards stunting. This task is a multi-class classification problem. We perform a benchmark study, using data from the Indonesian context, to comparatively evaluate the performances of four algorithms, i.e., logistic regression, naive Bayes, random forest, and support vector machine (SVM), and three extracted features, namely term occurrence, term presence, and term frequency-inverse document frequency (TF-IDF). The SVM method coupled with TF-IDF produced the highest accuracy value of 0.98, with a standard deviation of 0.03, due to its capability to automatically model the interaction between features.

Palabras claves

stunting - logistic regression - naive Bayes - random forest - support vector machine

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 10 Parte: 23 (2020)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

DOI

https://doi.org/10.3390/app10238621

Benchmarking of Multi-Class Algorithms for Classifying Documents Related to Stunting

Revistas destacadas