REVISTA
Future Internet

TODAS

Redirigiendo al acceso original de articulo en 22 segundos...

Inicio / Future Internet / Vol: 13 Par: 5 (2021) / Artículo

ARTÍCULO

TITULO

Exploiting Machine Learning for Improving In-Memory Execution of Data-Intensive Workflows on Parallel Machines

Riccardo Cantini

Fabrizio Marozzo

Alessio Orsino

Domenico Talia and Paolo Trunfio

Resumen

Workflows are largely used to orchestrate complex sets of operations required to handle and process huge amounts of data. Parallel processing is often vital to reduce execution time when complex data-intensive workflows must be run efficiently, and at the same time, in-memory processing can bring important benefits to accelerate execution. However, optimization techniques are necessary to fully exploit in-memory processing, avoiding performance drops due to memory saturation events. This paper proposed a novel solution, called the Intelligent In-memory Workflow Manager (IIWM), for optimizing the in-memory execution of data-intensive workflows on parallel machines. IIWM is based on two complementary strategies: (1) a machine learning strategy for predicting the memory occupancy and execution time of workflow tasks; (2) a scheduling strategy that allocates tasks to a computing node, taking into account the (predicted) memory occupancy and execution time of each task and the memory available on that node. The effectiveness of the machine learning-based predictor and the scheduling strategy were demonstrated experimentally using as a testbed, Spark, a high-performance Big Data processing framework that exploits in-memory computing to speed up the execution of large-scale applications. In particular, two synthetic workflows were prepared for testing the robustness of the IIWM in scenarios characterized by a high level of parallelism and a limited amount of memory reserved for execution. Furthermore, a real data analysis workflow was used as a case study, for better assessing the benefits of the proposed approach. Thanks to high accuracy in predicting resources used at runtime, the IIWM was able to avoid disk writes caused by memory saturation, outperforming a traditional strategy in which only dependencies among tasks are taken into account. Specifically, the IIWM achieved up to a 31% 31 % and a 40% 40 % reduction of makespan and a performance improvement up to 1.45× 1.45 × and 1.66× 1.66 × on the synthetic workflows and the real case study, respectively.

Palabras claves

workflow - data-intensive - in-memory - machine learning - Apache Spark - scheduling

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 13 Parte: 5 (2021)

MATERIAS

INFRAESTRUCTURA

REVISTAS SIMILARES

Future Internet
Big Data and Cognitive Computing
Buildings

DOI

https://doi.org/10.3390/fi13050121

Artículos similares

Interoperable Data Analytics Reference Architectures Empowering Digital-Twin-Aided Manufacturing

Acceso

Attila Csaba Marosi, Márk Emodi, Ákos Hajnal, Róbert Lovas, Tamás Kiss, Valerie Poser, Jibinraj Antony, Simon Bergweiler, Hamed Hamzeh, James Deslauriers and József Kovács

The use of mature, reliable, and validated solutions can save significant time and cost when introducing new technologies to companies. Reference Architectures represent such best-practice techniques and have the potential to increase the speed and relia... ver más

Revista: Future Internet

Using Machine Learning in Business Process Re-Engineering

Acceso

Younis Al-Anqoudi, Abdullah Al-Hamdani, Mohamed Al-Badawi and Rachid Hedjam

A business process re-engineering value in improving the business process is undoubted. Nevertheless, it is incredibly complex, time-consuming and costly. This study aims to review available literature in the use of machine learning for business process ... ver más

Revista: Big Data and Cognitive Computing

Ticket Sales Prediction and Dynamic Pricing Strategies in Public Transport

Acceso

Francesco Branda, Fabrizio Marozzo and Domenico Talia

In recent years, the demand for collective mobility services registered significant growth. In particular, the long-distance coach market underwent an important change in Europe, since FlixBus adopted a dynamic pricing strategy, providing low-cost transp... ver más

Revista: Big Data and Cognitive Computing

Breast Cancer Diagnosis System Based on Semantic Analysis and Choquet Integral Feature Selection for High Risk Subjects

Acceso

Soumaya Trabelsi Ben Ameur, Dorra Sellami, Laurent Wendling and Florence Cloppet

In this work, we build a computer aided diagnosis (CAD) system of breast cancer for high risk patients considering the breast imaging reporting and data system (BIRADS), mapping main expert concepts and rules. Therefore, a bag of words is built based on ... ver más

Revista: Big Data and Cognitive Computing

Predicting Multiple Functions of Sustainable Flood Retention Basins under Uncertainty via Multi-Instance Multi-Label Learning

Acceso

Qinli Yang, Christian Boehm, Miklas Scholz, Claudia Plant and Junming Shao

The ambiguity of diverse functions of sustainable flood retention basins (SFRBs) may lead to conflict and risk in water resources planning and management. How can someone provide an intuitive yet efficient strategy to uncover and distinguish the multiple... ver más

Revista: Water

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas