Inicio  /  Algorithms  /  Vol: 16 Par: 5 (2023)  /  Artículo
ARTÍCULO
TITULO

Subgroup Discovery in Machine Learning Problems with Formal Concepts Analysis and Test Theory Algorithms

Igor Masich    
Natalya Rezova    
Guzel Shkaberina    
Sergei Mironov    
Mariya Bartosh and Lev Kazakovtsev    

Resumen

A number of real-world problems of automatic grouping of objects or clustering require a reasonable solution and the possibility of interpreting the result. More specific is the problem of identifying homogeneous subgroups of objects. The number of groups in such a dataset is not specified, and it is required to justify and describe the proposed grouping model. As a tool for interpretable machine learning, we consider formal concept analysis (FCA). To reduce the problem with real attributes to a problem that allows the use of FCA, we use the search for the optimal number and location of cut points and the optimization of the support set of attributes. The approach to identifying homogeneous subgroups was tested on tasks for which interpretability is important: the problem of clustering industrial products according to primary tests (for example, transistors, diodes, and microcircuits) as well as gene expression data (collected to solve the problem of predicting cancerous tumors). For the data under consideration, logical concepts are identified, formed in the form of a lattice of formal concepts. Revealed concepts are evaluated according to indicators of informativeness and can be considered as homogeneous subgroups of elements and their indicative descriptions. The proposed approach makes it possible to single out homogeneous subgroups of elements and provides a description of their characteristics, which can be considered as tougher norms that the elements of the subgroup satisfy. A comparison is made with the COBWEB algorithm designed for conceptual clustering of objects. This algorithm is aimed at discovering probabilistic concepts. The resulting lattices of logical concepts and probabilistic concepts for the considered datasets are simple and easy to interpret.

 Artículos similares

       
 
Enrique Castillo    
This work is a short review of the state of the art aiming to contribute to the use, disclosure, and propagation of systems of linear inequalities in real life, teaching, and research. It shows that the algebraic structure of their solutions consists of ... ver más
Revista: Algorithms

 
Yuri Kuzmin and Stanislav Proshkin    
Based on a rigorous solution to the problem, analytical expressions are obtained for calculating the diffraction of the electromagnetic field of a grounded cable on an elongated dielectric spheroid in a conductive layer. The field of a grounded AC cable ... ver más
Revista: Applied Sciences

 
Claudia Canali, Caterina Gazzotti, Riccardo Lancellotti and Felice Schena    
In the last few years, fog computing has been recognized as a promising approach to support modern IoT applications based on microservices. The main characteristic of this application involve the presence of geographically distributed sensors or mobile e... ver más
Revista: Algorithms

 
José Manuel Porras, Juan Alfonso Lara, Cristóbal Romero and Sebastián Ventura    
Predicting student dropout is a crucial task in online education. Traditionally, each educational entity (institution, university, faculty, department, etc.) creates and uses its own prediction model starting from its own data. However, that approach is ... ver más
Revista: Algorithms

 
Alessandra Martines, Giulia Furfaro, Michele Solca, Maurizio Muzzi, Andrea Di Giulio and Sergio Rossi    
Microplastic pollution constitutes a serious environmental problem that requires more effective scientific research to describe its potential impacts on marine fauna. The interaction between microplastics and marine biota can have significant negative ef... ver más
Revista: Water