REVISTA
Eastern-European Journal of Enterprise Technologies

TODAS

Redirigiendo al acceso original de articulo en 21 segundos...

Inicio / Eastern-European Journal of Enterprise Technologies / Vol: 4 Núm: 2 Par: PP (2020) / Artículo

ARTÍCULO

TITULO

Development of a document classification method by using geodesic distance to calculate similarity of documents

Hung Vo-Trung

Resumen

Currently, the Internet has given people the opportunity to access to human knowledge quickly and conveniently through various channels such as Web pages, social networks, digital libraries, portals... However, with the process of exchanging and updating information quickly, the volume of information stored (in the form of digital documents) is increasing rapidly. Therefore, we are facing challenges in representing, storing, sorting and classifying documents.In this paper, we present a new approach to text classification. This approach is based on semi-supervised machine learning and Support Vector Machine (SVM). The new point of the study is that instead of calculating the distance between the vectors by Euclidean distance, we use geodesic distance. To do this, the text must first be expressed as an n-dimensional vector. In the n-dimensional vector space, each vector is represented by one point; use geodesic distance to calculate the distance from a point to nearby points and connect into a graph. The classification is based on calculating the shortest path between vertices on the graph through a kernel function. We conducted experiments on articles taken from Reuters on 5 different topics. To evaluate the proposed method, we tested the SVM method with the traditional calculation based on Euclidean distance and the method we proposed based on geodesic distance. The experiment was performed on the same data set of 5 topics: Business, Markets, World, Politics, and Technology. The results showed that the correct classification rate is better than the traditional SVM method based on Euclidean distance (average of 3.2 %)

Palabras claves

text classification - machine learning - geodesic distance - euclidian distance - SVM - NLP - kernel function

Acceso

PÁGINAS

pp. 25 - 32

NÚMERO

Volumen: 4 Número: 2 Parte: PP (2020)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

South African Journal of Science and Technology
Water
Aerospace

DOI

https://doi.org/10.15587/1729-4061.2020.203866

Artículos similares

Analysis of the Current Situation of Big Data MOOCs in the Intelligent Era Based on the Perspective of Improving the Mental Health of College Students

Acceso

Hongfeng Sang, Liyi Ma and Nan Ma

A three-dimensional MOOC analysis framework was developed, focusing on platform design, organizational mechanisms, and course construction. This framework aims to investigate the current situation of big data MOOCs in the intelligent era, particularly fr... ver más

Revista: Information

A Review of Document Image Enhancement Based on Document Degradation Problem

Acceso

Yanxi Zhou, Shikai Zuo, Zhengxian Yang, Jinlong He, Jianwen Shi and Rui Zhang

Document image enhancement methods are often used to improve the accuracy and efficiency of automated document analysis and recognition tasks such as character recognition. These document images could be degraded or damaged for various reasons including ... ver más

Revista: Applied Sciences

Investigating Novice Developers? Code Commenting Trends Using Machine Learning Techniques

Acceso

Tahira Niazi, Teerath Das, Ghufran Ahmed, Syed Muhammad Waqas, Sumra Khan, Suleman Khan, Ahmed Abdelaziz Abdelatif and Shaukat Wasi

Code comments are considered an efficient way to document the functionality of a particular block of code. Code commenting is a common practice among developers to explain the purpose of the code in order to improve code comprehension and readability. Re... ver más

Revista: Algorithms

Exploring and Visualizing Research Progress and Emerging Trends of Event Prediction: A Survey

Acceso

Shishuo Xu, Jinbo Liu, Songnian Li, Su Yang and Fangning Li

Over the last decade, event prediction has drawn attention from both academic and industry communities, resulting in a substantial volume of scientific papers published in a wide range of journals by scholars from different countries and disciplines. How... ver más

Revista: Applied Sciences

Towards a Method to Enable the Selection of Physical Models within the Systems Engineering Process: A Case Study with Simulink Models

Acceso

Eduardo Cibrián, Jose María Álvarez-Rodríguez, Roy Mendieta and Juan Llorens

The use of different techniques and tools is a common practice to cover all stages in the development life-cycle of systems generating a significant number of work products. These artefacts are frequently encoded using diverse formats, and often require ... ver más

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas