Redirigiendo al acceso original de articulo en 23 segundos...
ARTÍCULO
TITULO

A Hierarchical Hadoop Framework to Process Geo-Distributed Big Data

Giuseppe Di Modica and Orazio Tomarchio    

Resumen

In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high performance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunity of fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper.

 Artículos similares

       
 
Yogeswaranathan Kalyani, Liam Vorster, Rebecca Whetton and Rem Collier    
In the last decade, digital twin (DT) technology has received considerable attention across various domains, such as manufacturing, smart healthcare, and smart cities. The digital twin represents a digital representation of a physical entity, object, sys... ver más
Revista: Future Internet

 
Hanyue Xu, Kah Phooi Seng, Jeremy Smith and Li Minn Ang    
In the context of smart cities, the integration of artificial intelligence (AI) and the Internet of Things (IoT) has led to the proliferation of AIoT systems, which handle vast amounts of data to enhance urban infrastructure and services. However, the co... ver más
Revista: Future Internet

 
Qian Qu, Mohsen Hatami, Ronghua Xu, Deeraj Nagothu, Yu Chen, Xiaohua Li, Erik Blasch, Erika Ardiles-Cruz and Genshe Chen    
Over the past decade, there has been a remarkable acceleration in the evolution of smart cities and intelligent spaces, driven by breakthroughs in technologies such as the Internet of Things (IoT), edge?fog?cloud computing, and machine learning (ML)/arti... ver más
Revista: Future Internet

 
Jose A. Montenegro and Antonio Muñoz    
In this manuscript, we present EventGeoScout, an innovative framework for collaborative geographic information management, tailored to meet the needs of the dynamically changing landscape of geographic data integration and quality enhancement. EventGeoSc... ver más

 
Jing Liu, Xuesong Hai and Keqin Li    
Massive amounts of data drive the performance of deep learning models, but in practice, data resources are often highly dispersed and bound by data privacy and security concerns, making it difficult for multiple data sources to share their local data dir... ver más
Revista: Future Internet