Experimental evaluation of the temporal efficiency of big data processing for specified storage formats

V.A. Belov

E.V. Nikulchev

Resumen

One of the most important tasks of a modern big data processing platform is the task of choosing data storage formats. The choice of formats is based on various performance criteria, which depend on the class of objects and the requirements. One of the most important criteria is the time spent in various big data processing operations. The paper studies the five most popular formats for storing big data (avro, CSV, JSON, ORC, parquet), proposes an experimental bench for assessing time efficiency, and conducts a comparative analysis of experimental estimates of the characteristics of the formats under consideration. For the experiment, the basic data processing operations were considered using the Apache Spark framework. The format selection algorithm is developed based on the hierarchy analysis method. As a result, a methodology was formed for choosing a format from alternatives based on experimental estimates of parameters and a methodology for analyzing hierarchies for the task of choosing time-efficient basic operations of storage formats for big data in the Apache Hadoop system using Apache Spark.

Acceso

PÁGINAS

pp. 95 - 102

NÚMERO

Volumen: 9 Número: 9 Parte: 0 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Aceh International Journal of Science and Technology
Infrastructures
Applied Sciences

Artículos similares

Numerical Evaluation of Aircraft Aerodynamic Static and Dynamic Stability Derivatives by a Mid-Fidelity Approach

Acceso

Daniele Granata, Alberto Savino and Alex Zanotti

The present study aimed to investigate the capability of mid-fidelity aerodynamic solvers in performing a preliminary evaluation of the static and dynamic stability derivatives of aircraft configurations in their design phase. In this work, the mid-fidel... ver más

Revista: Aerospace

Delay-D: Research on the Lifespan and Performance of Storage Devices in Unmanned Aerial Vehicles

Acceso

Donghyun Kang

Despite the technological achievements of unmanned aerial vehicles (UAVs) growing in academia and industry, there is a lack of studies on the storage devices in UAVs. However, this is an important aspect because the storage devices in UAVs have a limited... ver más

Revista: Aerospace

Analyzing Multi-Mode Fatigue Information from Speech and Gaze Data from Air Traffic Controllers

Acceso

Lin Xu, Shanxiu Ma, Zhiyuan Shen, Shiyu Huang and Ying Nan

In order to determine the fatigue state of air traffic controllers from air talk, an algorithm is proposed for discriminating the fatigue state of controllers based on applying multi-speech feature fusion to voice data using a Fuzzy Support Vector Machin... ver más

Revista: Aerospace

Research on the Simulation Method of HTTP Traffic Based on GAN

Acceso

Chenglin Yang, Dongliang Xu and Xiao Ma

Due to the increasing severity of network security issues, training corresponding detection models requires large datasets. In this work, we propose a novel method based on generative adversarial networks to synthesize network data traffic. We introduced... ver más

Revista: Applied Sciences

A Stacking Model-Based Classification Algorithm Is Used to Predict Social Phobia

Acceso

Changchang Li, Botao Xu, Zhiwei Chen, Xiaoou Huang, Jing (Selena) He and Xia Xie

University students, as a special group, face multiple psychological pressures and challenges, making them susceptible to social anxiety disorder. However, there are currently no articles using machine learning algorithms to identify predictors of social... ver más

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas