EvoSplit: An Evolutionary Approach to Split a Multi-Label Data Set into Disjoint Subsets

Francisco Florez-Revuelta

Resumen

This paper presents a new evolutionary approach, EvoSplit, for the distribution of multi-label data sets into disjoint subsets for supervised machine learning. Currently, data set providers either divide a data set randomly or using iterative stratification, a method that aims to maintain the label (or label pair) distribution of the original data set into the different subsets. Following the same aim, this paper first introduces a single-objective evolutionary approach that tries to obtain a split that maximizes the similarity between those distributions independently. Second, a new multi-objective evolutionary algorithm is presented to maximize the similarity considering simultaneously both distributions (labels and label pairs). Both approaches are validated using well-known multi-label data sets as well as large image data sets currently used in computer vision and machine learning applications. EvoSplit improves the splitting of a data set in comparison to the iterative stratification following different measures: Label Distribution, Label Pair Distribution, Examples Distribution, folds and fold-label pairs with zero positive examples.

Palabras claves

multi-label data sets - supervised learning - machine learning - evolutionary computation - big data applications

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 11 Parte: 6 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Water
Applied Sciences
Aerospace

DOI

https://doi.org/10.3390/app11062823

Artículos similares

Algorithm for Propeller Optimization Based on Differential Evolution

Acceso

Andry Sedelnikov, Evgenii Kurkin, Jose Gabriel Quijada-Pioquinto, Oleg Lukyanov, Dmitrii Nazarov, Vladislava Chertykovtseva, Ekaterina Kurkina and Van Hung Hoang

This paper describes the development of a methodology for air propeller optimization using Bezier curves to describe blade geometry. The proposed approach allows for more flexibility in setting the propeller shape, for example, using a variable airfoil o... ver más

Revista: Computation

Computing RF Tree Distance over Succinct Representations

Acceso

António Pedro Branco, Cátia Vaz and Alexandre P. Francisco

There are several tools available to infer phylogenetic trees, which depict the evolutionary relationships among biological entities such as viral and bacterial strains in infectious outbreaks or cancerous cells in tumor progression trees. These tools re... ver más

Revista: Algorithms

Data-Driven Deformation Prediction of Accumulation Landslides in the Middle Qinling-Bashan Mountains Area

Acceso

Juan Ma, Qiang Yang, Mingzhi Zhang, Yao Chen, Wenyi Zhao, Chengyu Ouyang and Dongping Ming

Accurately predicting landslide deformation based on monitoring data is key to successful early warning of landslide disasters. Landslide displacement?time curves offer an intuitive reflection of the landslide motion process and deformation predictions o... ver más

Revista: Water

Analyses of Morphological Differences between Geographically Distinct Populations of Gymnodiptychus dybowskii

Acceso

Linghui Hu, Na Yao, Chengxin Wang, Liting Yang, Gulden Serekbol, Bin Huo, Xuelian Qiu, Fangze Zi, Yong Song and Shengao Chen

To study the morphological differences between and the evolutionary mechanisms driving the differentiation of geographically distinct populations of Gymnodiptychus dybowskii, 158 fish were collected from the Turks River and the Manas River in Xinjiang fr... ver más

Revista: Water

Evolutionary System Design with Answer Set Programming

Acceso

Christian Haubelt, Luise Müller, Kai Neubauer, Torsten Schaub and Philipp Wanko

We address the problem of evolutionary system design (ESD) by means of answer set programming modulo difference constraints (AMT). The goal of this design approach is to synthesize new product variants or generations from existing products. We start by f... ver más

Revista: Algorithms

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas