|
|
|
Abdelrhman Eldallal and Eduard Barbu
Automatic Keyphrase Extraction involves identifying essential phrases in a document. These keyphrases are crucial in various tasks, such as document classification, clustering, recommendation, indexing, searching, summarization, and text simplification. ...
ver más
|
|
|
|
|
|
|
Samuel R. Schrader and Eren Gultepe
The evaluation of similarities between natural languages often relies on prior knowledge of the languages being studied. We describe three methods for building phylogenetic trees and clustering languages without the use of language-specific information. ...
ver más
|
|
|
|
|
|
|
Chunchun Hu, Qin Liang, Nianxue Luo and Shuixiang Lu
Analysis of the spatiotemporal distribution of online public opinion topics can help understand the hotspots of public concern. The topic model is employed widely in public opinion topic clustering for social media data. In order to handle topic-clusteri...
ver más
|
|
|
|
|
|
|
Lukas Busch, Ruben van Heusden and Maarten Marx
Page stream segmentation (PSS) is the task of retrieving the boundaries that separate source documents given a consecutive stream of documents (for example, sequentially scanned PDF files). The task has recently gained more interest as a result of the di...
ver más
|
|
|
|
|
|
|
Liliya Demidova, Dmitry Zhukov, Elena Andrianova and Vladimir Kalinin
To solve the problem of text clustering according to semantic groups, we suggest using a model of a unified lexico-semantic bond between texts and a similarity matrix based on it. Using lexico-semantic analysis methods, we can create ?term?document? matr...
ver más
|
|
|
|
|
|
|
Sergey Gorshkov,Eugene Ilyushin,Anastasia Chernysheva,Viacheslav Goiko,Dmitry Namiot
Pág. 12 - 17
Topic modeling is one of the most widely used methods in text analysis. It can be used to select topics as well as to find the topics distributed in each document from the corpus. In this article, we present a method for clustering co...
ver más
|
|
|
|
|
|
|
Laith Abualigah, Amir H. Gandomi, Mohamed Abd Elaziz, Abdelazim G. Hussien, Ahmad M. Khasawneh, Mohammad Alshinwan and Essam H. Houssein
Text clustering is one of the efficient unsupervised learning techniques used to partition a huge number of text documents into a subset of clusters. In which, each cluster contains similar documents and the clusters contain dissimilar text documents. Na...
ver más
|
|
|
|
|
|
|
Ibraheem Al-Jadir, Kok Wai Wong, Chun Che Fung and Hong Xie
Feature Selection (FS) methods have been studied extensively in the literature, and there are a crucial component in machine learning techniques. However, unsupervised text feature selection has not been well studied in document clustering problems. Feat...
ver más
|
|
|
|
|
|
|
Oleksii Kungurtsev,Svitlana Zinovatna,Iana Potochniak,Nataliia Novikova
Pág. 39 - 47
The aim of research is to improve the quality of domain dictionaries by expanding the corpus of the documents under study by using short documents. A document model is proposed that allows to define a short document and the need to combine it with other ...
ver más
|
|
|
|
|
|
|
Pranomkorn Ampornphan and Sutep Tongngam
A patent is an important document issued by the government to protect inventions or product design. Inventions consist of mechanical structures, production processes, quality improvements of products, and so on. Generally, goods or appliances in everyday...
ver más
|
|
|
|