Chinese Comma Disambiguation in Math Word Problems Using SMOTE and Random Forests

Jingxiu Huang

Qingtang Liu

Yunxiang Zheng and Linjing Wu

Resumen

Natural language understanding technologies play an essential role in automatically solving math word problems. In the process of machine understanding Chinese math word problems, comma disambiguation, which is associated with a class imbalance binary learning problem, is addressed as a valuable instrument to transform the problem statement of math word problems into structured representation. Aiming to resolve this problem, we employed the synthetic minority oversampling technique (SMOTE) and random forests to comma classification after their hyperparameters were jointly optimized. We propose a strict measure to evaluate the performance of deployed comma classification models on comma disambiguation in math word problems. To verify the effectiveness of random forest classifiers with SMOTE on comma disambiguation, we conducted two-stage experiments on two datasets with a collection of evaluation measures. Experimental results showed that random forest classifiers were significantly superior to baseline methods in Chinese comma disambiguation. The SMOTE algorithm with optimized hyperparameter settings based on the categorical distribution of different datasets is preferable, instead of with its default values. For practitioners, we suggest that hyperparameters of a classification models be optimized again after parameter settings of SMOTE have been changed.

Palabras claves

comma disambiguation - feature engineering - hyperparameter tuning - imbalanced learning - natural language understanding - random forests

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 2 Parte: 4 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

DOI

https://doi.org/10.3390/ai2040044

Chinese Comma Disambiguation in Math Word Problems Using SMOTE and Random Forests

Revistas destacadas