A Discrete Hidden Markov Model for SMS Spam Detection

Tian Xia and Xuemin Chen

Resumen

Many machine learning methods have been applied for short messaging service (SMS) spam detection, including traditional methods such as naïve Bayes (NB), vector space model (VSM), and support vector machine (SVM), and novel methods such as long short-term memory (LSTM) and the convolutional neural network (CNN). These methods are based on the well-known bag of words (BoW) model, which assumes documents are unordered collection of words. This assumption overlooks an important piece of information, i.e., word order. Moreover, the term frequency, which counts the number of occurrences of each word in SMS, is unable to distinguish the importance of words, due to the length limitation of SMS. This paper proposes a new method based on the discrete hidden Markov model (HMM) to use the word order information and to solve the low term frequency issue in SMS spam detection. The popularly adopted SMS spam dataset from the UCI machine learning repository is used for performance analysis of the proposed HMM method. The overall performance is compatible with deep learning by employing CNN and LSTM models. A Chinese SMS spam dataset with 2000 messages is used for further performance evaluation. Experiments show that the proposed HMM method is not language-sensitive and can identify spam with high accuracy on both datasets.

Palabras claves

short messaging service (SMS) - spam detection - hidden Markov model (HMM) - text classification - natural language processing (NLP)

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 10 Parte: 14 (2020)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences
Information
Journal of Marine Science and Engineering

DOI

https://doi.org/10.3390/app10145011

Artículos similares

Construction of a matrix discrete model of a three-dimensional body for the reconstruction of its shape

Acceso

Oleksandr Reuta,Hadi Hab Raman,Dmitry Mozgovoy Pág. 6 - 16

A matrix model of the representation of spatial objects for the synthesis, reconstruction, and analysis of their shape is proposed. The model is built on the basis of discrete data about the object, such as, for example, raster images or readings of spat... ver más

Revista: Eastern-European Journal of Enterprise Technologies

A steganographic method of improved resistance to the rich modelbased analysis

Acceso

Nikolay Kalashnikov,Olexandr Kokhanov,Olexandr Iakovenko,Nataliia Kushnirenko Pág. 37 - 42

This paper addresses the task of developing a steganographic method to hide information, resistant to analysis based on the Rich model (which includes several different submodels), using statistical indicators for the distribution of the pairs of coeffic... ver más

Revista: Eastern-European Journal of Enterprise Technologies

Power Quality Disturbance Classification Based on DWT and Multilayer Perceptron Extreme Learning Machine

Acceso

Jidong Wang, Zhilin Xu and Yanbo Che

In order to effectively identify complex power quality disturbances, a power quality disturbance classification method based on empirical wavelet transform and a multi-layer perceptron extreme learning machine (ELM) is proposed. The model uses the discre... ver más

Revista: Applied Sciences

A biomedical system based on fuzzy discrete hidden Markov model for the diagnosis of the brain diseases

Acceso

Harun U¿uz, Ali Öztürk, R¿dvan Saraço¿lu, Ahmet Arslan Pág. 1104 - 1114

Revista: EXPERT SYSTEMS WITH APPLICATIONS

Filtering of discrete-time systems hidden in discrete-time random measures

Acceso

Aggoun, L. Benkherouf, L. Pág. 273 - 282

Revista: MATHEMATICAL AND COMPUTER MODELLING

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas