Inicio  /  Algorithms  /  Vol: 16 Par: 12 (2023)  /  Artículo
ARTÍCULO
TITULO

Improving Clustering Accuracy of K-Means and Random Swap by an Evolutionary Technique Based on Careful Seeding

Libero Nigro and Franco Cicirelli    

Resumen

K-Means is a ?de facto? standard clustering algorithm due to its simplicity and efficiency. K-Means, though, strongly depends on the initialization of the centroids (seeding method) and often gets stuck in a local sub-optimal solution. K-Means, in fact, mainly acts as a local refiner of the centroids, and it is unable to move centroids all over the data space. Random Swap was defined to go beyond K-Means, and its modus operandi integrates K-Means in a global strategy of centroids management, which can often generate a clustering solution close to the global optimum. This paper proposes an approach which extends both K-Means and Random Swap and improves the clustering accuracy through an evolutionary technique and careful seeding. Two new algorithms are proposed: the Population-Based K-Means (PB-KM) and the Population-Based Random Swap (PB-RS). Both algorithms consist of two steps: first, a population of J" role="presentation">??J J candidate solutions is built, and then the candidate centroids are repeatedly recombined toward a final accurate solution. The paper motivates the design of PB-KM and PB-RS, outlines their current implementation in Java based on parallel streams, and demonstrates the achievable clustering accuracy using both synthetic and real-world datasets.

 Artículos similares

       
 
Ramasubbareddy Somula, Yongyun Cho and Bhabendu Kumar Mohanta    
In recent years, the Internet of Things (IoT) has transformed human life by improving quality of life and revolutionizing all business sectors. The sensor nodes in IoT are interconnected to ensure data transfer to the sink node over the network. Owing to... ver más
Revista: Information

 
Tzu-Ying Chiu and Ying-Chih Lai    
The study of managing risk in aviation is the key to improving flight safety. Compared to the other flight operation phases, the approach and landing phases are more critical and dangerous. This study aims to detect and analyze unstable approaches in Tai... ver más
Revista: Aerospace

 
Haijun Liang, Shiyu Zhang and Jianguo Kong    
The air traffic control (ATC) network?s airspace sector is a crucial component of air traffic management. The increasing demand for air transportation services has made limited airspace a significant challenge to sustainable and efficient air transport o... ver más
Revista: Aerospace

 
Konstantinos Charmanas, Nikolaos Mittas and Lefteris Angelis    
Security vulnerabilities constitute one of the most important weaknesses of hardware and software security that can cause severe damage to systems, applications, and users. As a result, software vendors should prioritize the most dangerous and impactful ... ver más
Revista: Information

 
Ayman Taha, Bernard Cosgrave and Susan Mckeever    
Insurance is a data-rich sector, hosting large volumes of customer data that is analysed to evaluate risk. Machine learning techniques are increasingly used in the effective management of insurance risk. Insurance datasets by their nature, however, are o... ver más
Revista: Applied Sciences