Resumen
Comparing two sets of documents to identify new topics is useful in many applications, like discovering trending topics from sets of scientific papers, emerging topic detection in microblogs, and interpreting sentiment variations in Twitter. In this paper, the main topic-modeling-based approaches to address this task are examined to identify limitations and necessary enhancements. To overcome these limitations, we introduce two separate frameworks to discover emerging topics through a filtered latent Dirichlet allocation (filtered-LDA) model. The model acts as a filter that identifies old topics from a timestamped set of documents, removes all documents that focus on old topics, and keeps documents that discuss new topics. Filtered-LDA also genuinely reduces the chance of using keywords from old topics to represent emerging topics. The final stage of the filter uses multiple topic visualization formats to improve human interpretability of the filtered topics, and it presents the most-representative document for each topic.