Learning Word Embeddings with Chi-Square Weights for Healthcare Tweet Classification

Sicong Kuang and Brian D. Davison

Resumen

Twitter is a popular source for the monitoring of healthcare information and public disease. However, there exists much noise in the tweets. Even though appropriate keywords appear in the tweets, they do not guarantee the identification of a truly health-related tweet. Thus, the traditional keyword-based classification task is largely ineffective. Algorithms for word embeddings have proved to be useful in many natural language processing (NLP) tasks. We introduce two algorithms based on an existing word embedding learning algorithm: the continuous bag-of-words model (CBOW). We apply the proposed algorithms to the task of recognizing healthcare-related tweets. In the CBOW model, the vector representation of words is learned from their contexts. To simplify the computation, the context is represented by an average of all words inside the context window. However, not all words in the context window contribute equally to the prediction of the target word. Greedily incorporating all the words in the context window will largely limit the contribution of the useful semantic words and bring noisy or irrelevant words into the learning process, while existing word embedding algorithms also try to learn a weighted CBOW model. Their weights are based on existing pre-defined syntactic rules while ignoring the task of the learned embedding. We propose learning weights based on the words? relative importance in the classification task. Our intuition is that such learned weights place more emphasis on words that have comparatively more to contribute to the later task. We evaluate the embeddings learned from our algorithms on two healthcare-related datasets. The experimental results demonstrate that embeddings learned from the proposed algorithms outperform existing techniques by a relative accuracy improvement of over 9%.

Palabras claves

word embedding - healthcare - classification

Acceso

PÁGINAS

NÚMERO

Volumen: 7 Número: 8 Parte: August (2017)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Informatics
Information
Applied Sciences

DOI

https://doi.org/10.3390/app7080846

Artículos similares

Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning

Acceso

Nasrin Elhassan, Giuseppe Varone, Rami Ahmed, Mandar Gogate, Kia Dashtipour, Hani Almoamari, Mohammed A. El-Affendi, Bassam Naji Al-Tamimi, Faisal Albalwy and Amir Hussain

Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing u... ver más

Revista: Computers

MBTI Personality Prediction Using Machine Learning and SMOTE for Balancing Data Based on Statement Sentences

Acceso

Gregorius Ryan, Pricillia Katarina and Derwin Suhartono

The rise of social media as a platform for self-expression and self-understanding has led to increased interest in using the Myers?Briggs Type Indicator (MBTI) to explore human personalities. Despite this, there needs to be more research on how other wor... ver más

Revista: Information

An Abstractive Summarization Model Based on Joint-Attention Mechanism and a Priori Knowledge

Acceso

Yuanyuan Li, Yuan Huang, Weijian Huang, Junhao Yu and Zheng Huang

An abstractive summarization model based on the joint-attention mechanism and a priori knowledge is proposed to address the problems of the inadequate semantic understanding of text and summaries that do not conform to human language habits in abstractiv... ver más

Revista: Applied Sciences

Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)

Acceso

Sergiu Zaharia, Traian Rebedea and Stefan Trausan-Matu

The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, f... ver más

Revista: Applied Sciences

Chinese Lip-Reading Research Based on ShuffleNet and CBAM

Acceso

Yixian Fu, Yuanyao Lu and Ran Ni

Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the na... ver más

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas