ARTÍCULO
TITULO

Anomaly detection in real-time streaming data processing

D.E. Savitsky    
M.E. Dunaev    
K.S. Zaytsev    

Resumen

The purpose of this work is to study methods for detecting anomalies in the processing of data streams in distributed streams in real time. To do this, the authors carried out a modification of the K-Means algorithm, called K-Means in real time, and carried out a comparative analysis of the effectiveness of the developed algorithm with K-Means from the MLlib library of the Apache Spark framework. The comparison confirmed the effectiveness of the proposed modification. To conduct experiments with the algorithm, a special data array (dataset) was built, which included about 1000 measurements of the Apache Kafka server log metrics with one topic, two providers and a consumer. Anomalous fragments have been added to this set of dates, with a large number of messages in the blink of an eye and/or size. The dataset values have been pre-processed to align the index of metrics and exclude correlations. Results developed by the authors of the K-Means algorithm for solving anomaly search problems, taking into account the detection time of its effectiveness.

 Artículos similares

       
 
George Papageorgiou, Vangelis Sarlis and Christos Tjortjis    
This study utilized advanced data mining and machine learning to examine player injuries in the National Basketball Association (NBA) from 2000?01 to 2022?23. By analyzing a dataset of 2296 players, including sociodemographics, injury records, and financ... ver más
Revista: Information

 
Juan Luis Pérez-Ruiz, Yu Tang, Igor Loboda and Luis Angel Miró-Zárate    
In the field of aircraft engine diagnostics, many advanced algorithms have been proposed over the last few years. However, there is still wide room for improvement, especially in the development of more integrated and complete engine health management sy... ver más
Revista: Aerospace

 
Urszula Libal and Pawel Biernacki    
An automatic honey bee classification system based on audio signals for tracking the frequency of workers and drones entering and leaving a hive.
Revista: Applied Sciences

 
Mohamed Shenify, Fokrul Alom Mazarbhuiya and A. S. Wungreiphi    
There are many applications of anomaly detection in the Internet of Things domain. IoT technology consists of a large number of interconnecting digital devices not only generating huge data continuously but also making real-time computations. Since IoT d... ver más
Revista: Applied Sciences

 
Woo-Hyun Choi and Jongwon Kim    
Industrial control systems (ICSs) play a crucial role in managing and monitoring critical processes across various industries, such as manufacturing, energy, and water treatment. The connection of equipment from various manufacturers, complex communicati... ver más