REVISTA
Computation

TODAS

Redirigiendo al acceso original de articulo en 16 segundos...

Inicio / Computation / Vol: 8 Par: 4 (2020) / Artículo

ARTÍCULO

TITULO

Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE

Luis Ariosto Serna Cardona

Hernán Darío Vargas-Cardona

Piedad Navarro González

David Augusto Cardenas Peña and Álvaro Ángel Orozco Gutiérrez

Resumen

The recurrent use of databases with categorical variables in different applications demands new alternatives to identify relevant patterns. Classification is an interesting approach for the recognition of this type of data. However, there are a few amount of methods for this purpose in the literature. Also, those techniques are specifically focused only on kernels, having accuracy problems and high computational cost. For this reason, we propose an identification approach for categorical variables using conventional classifiers (LDC-QDC-KNN-SVM) and different mapping techniques to increase the separability of classes. Specifically, we map the initial features (categorical attributes) to another space, using the Chi-square (C-S) as a measure of dissimilarity. Then, we employ the (t-SNE) for reducing dimensionality of data to two or three features, allowing a significant reduction of computational times in learning methods. We evaluate the performance of proposed approach in terms of accuracy for several experimental configurations and public categorical datasets downloaded from the UCI repository, and we compare with relevant state of the art methods. Results show that C-S mapping and t-SNE considerably diminish the computational times in recognitions tasks, while the accuracy is preserved. Also, when we apply only the C-S mapping to the datasets, the separability of classes is enhanced, thus, the performance of learning algorithms is clearly increased.

Palabras claves

Chi-square - classification - t-SNE - categorical data - dissimilarity

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 8 Parte: 4 (2020)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences
Algorithms
AI

DOI

https://doi.org/10.3390/computation8040104

Artículos similares

A Layered KNN-SVM Approach to Predict Missing Values of Functional Requirements in Product Customization

Acceso

Ye Gu, Shuyou Zhang, Lemiao Qiu, Zili Wang and Lichun Zhang

The conversion from functional requirements (FRs) to design parameters is the foundation of product customization. However, original customer needs usually result in incomplete FRs, limited by customers? incomprehension on the design requirements of thes... ver más

Revista: Applied Sciences

SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features

Acceso

Mimi Mukherjee and Matloob Khushi

Real-world datasets are heavily skewed where some classes are significantly outnumbered by the other classes. In these situations, machine learning algorithms fail to achieve substantial efficacy while predicting these underrepresented instances. To solv... ver más

Revista: Applied System Innovation

Dynamic Coastal-Shelf Seascapes to Support Marine Policies Using Operational Coastal Oceanography: The French Example

Acceso

Emilie Tew-Kai, Victor Quilfen, Marie Cachera and Martial Boutet

In the context of maritime spatial planning and the implementation of spatialized Good Environmental Status indicators in the Marine Strategy Framework Directive (MSFD), the definition of a mosaic composed of coherent and standardised spatial units is ne... ver más

Revista: Journal of Marine Science and Engineering

Natural Language Processing Based Method for Clustering and Analysis of Aviation Safety Narratives

Acceso

Rodrigo L. Rose, Tejas G. Puranik and Dimitri N. Mavris

The complexity of commercial aviation operations has grown substantially in recent years, together with a diversification of techniques for collecting and analyzing flight data. As a result, data-driven frameworks for enhancing flight safety have grown i... ver más

Revista: Aerospace

J48SS: A Novel Decision Tree Approach for the Handling of Sequential and Time Series Data

Acceso

Andrea Brunello, Enrico Marzano, Angelo Montanari and Guido Sciavicco

Temporal information plays a very important role in many analysis tasks, and can be encoded in at least two different ways. It can be modeled by discrete sequences of events as, for example, in the business intelligence domain, with the aim of tracking t... ver más

Revista: Computers

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas