Redirigiendo al acceso original de articulo en 22 segundos...
Inicio  /  Information  /  Vol: 13 Par: 11 (2022)  /  Artículo
ARTÍCULO
TITULO

A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification

Jamil Al-Sawwa and Mohammad Almseidin    

Resumen

With the rapid development of internet technology, the amount of collected or generated data has increased exponentially. The sheer volume, complexity, and unbalanced nature of this data pose a challenge to the scientific community to extract meaningful information from this data within a reasonable time. In this paper, we implemented a scalable design of an artificial bee colony for big data classification using Apache Spark. In addition, a new fitness function is proposed to handle unbalanced data. Two experiments were performed using the real unbalanced datasets to assess the performance and scalability of our proposed algorithm. The performance results reveal that our proposed fitness function can efficiently deal with unbalanced datasets and statistically outperforms the existing fitness function in terms of G-mean and F1" role="presentation">??1F1 F 1 -Score. In additon, the scalability results demonstrate that our proposed Spark-based design obtained outstanding speedup and scaleup results that are very close to optimal. In addition, our Spark-based design scales efficiently with increasing data size.

 Artículos similares

       
 
Antonio Maci, Alessandro Santorsola, Antonio Coscia and Andrea Iannacone    
Web phishing is a form of cybercrime aimed at tricking people into visiting malicious URLs to exfiltrate sensitive data. Since the structure of a malicious URL evolves over time, phishing detection mechanisms that can adapt to such variations are paramou... ver más
Revista: Computers

 
Mantas Bacevicius and Agne Paulauskaite-Taraseviciene    
Various machine learning algorithms have been applied to network intrusion classification problems, including both binary and multi-class classifications. Despite the existence of numerous studies involving unbalanced network intrusion datasets, such as ... ver más
Revista: Applied Sciences

 
Massimiliano Greco, Pier Francesco Caruso, Sofia Spano, Gianluigi Citterio, Antonio Desai, Alberto Molteni, Romina Aceto, Elena Costantini, Antonio Voza and Maurizio Cecconi    
Background: Sepsis is one of the major causes of in-hospital death, and is frequent in patients presenting to the emergency department (ED). Early identification of high-risk septic patients is critical. Machine learning (ML) techniques have been propose... ver más
Revista: Algorithms

 
Dajana Bartulovic, Sanja Steiner, Dario Fakle? and Martina Mavrin Jelicic    
Conducting flight operations at the pace of air traffic relies on shift work, overtime work, work at night, work in different and numerous time zones, and unbalanced flight crew schedules. Such working hours and workload settings can cause disturbances o... ver más
Revista: Aerospace

 
Omar Azib Alkhudaydi, Moez Krichen and Ans D. Alghamdi    
With the increasing severity and frequency of cyberattacks, the rapid expansion of smart objects intensifies cybersecurity threats. The vast communication traffic data between Internet of Things (IoT) devices presents a considerable challenge in defendin... ver más
Revista: Information