Resumen
With the increasing popularity of Twitter as both a social media platform and a data source for companies, decision makers, advertisers, and even researchers alike, data have been so massive that manual labeling is no longer feasible. This research uses a semi-supervised approach to sentiment analysis of both English and Tagalog tweets using a base classifier. In this study involving the Philippines, where social media played a central role in the campaign of both candidates, the tweets during the widely contested race between the son of the Philippines? former President and Dictator, and the outgoing Vice President of the Philippines were used. Using Natural Language Processing techniques, these tweets were annotated, processed, and trained to classify both English and Tagalog tweets into three polarities: positive, neutral, and negative. Through the Self-Training with Multinomial Naïve Bayes as base classifier with 30% unlabeled data, the results yielded an accuracy of 84.83%, which outweighs other studies using Twitter data from the Philippines.