Resumen
Efforts have been dedicated by researchers in the field of natural language processing (NLP) to detecting and combating fake news using an assortment of machine learning (ML) and deep learning (DL) techniques. In this paper, a review of the existing studies is conducted to understand and curtail the dissemination of fake news. Specifically, we conducted a benchmark study using a wide range of (1) classical ML algorithms such as logistic regression (LR), support vector machines (SVM), decision tree (DT), naive Bayes (NB), random forest (RF), XGBoost (XGB) and an ensemble learning method of such algorithms, (2) advanced ML algorithms such as convolutional neural networks (CNNs), bidirectional long short-term memory (BiLSTM), bidirectional gated recurrent units (BiGRU), CNN-BiLSTM, CNN-BiGRU and a hybrid approach of such techniques and (3) DL transformer-based models such as BERT????????
b
a
s
e
and RoBERTa????????
b
a
s
e
. The experiments are carried out using different pretrained word embedding methods across four well-known real-world fake news datasets?LIAR, PolitiFact, GossipCop and COVID-19?to examine the performance of different techniques across various datasets. Furthermore, a comparison is made between context-independent embedding methods (e.g., GloVe) and the effectiveness of BERT????????
b
a
s
e
?contextualised representations in detecting fake news. Compared with the state of the art?s results across the used datasets, we achieve better results by solely relying on news text. We hope this study can provide useful insights for researchers working on fake news detection.