Redirigiendo al acceso original de articulo en 23 segundos...
ARTÍCULO
TITULO

Skewness-Based Partitioning in SpatialHadoop

Alberto Belussi    
Sara Migliorini and Ahmed Eldawy    

Resumen

In recent years, several extensions of the Hadoop system have been proposed for dealing with spatial data. SpatialHadoop belongs to this group of projects and includes some MapReduce implementations of spatial operators, like range queries and spatial join. the MapReduce paradigm is based on the fundamental principle that a task can be parallelized by partitioning data into chunks and performing the same operation on them, (map phase), eventually combining the partial results at the end (reduce phase). Thus, the applied partitioning technique can tremendously affect the performance of a parallel execution, since it is the key point for obtaining balanced map tasks and exploiting the parallelism as much as possible. When uniformly distributed datasets are considered, this goal can be easily obtained by using a regular grid covering the whole reference space for partitioning the geometries of the input dataset; conversely, with skewed distributed datasets, this might not be the right choice and other techniques have to be applied. for instance, SpatialHadoop can produce a global index also by means of a Quadtree-based grid or an Rtree-based grid, which in turn are more expensive index structures to build. This paper proposes a technique based on both a box counting function and a heuristic, rooted on theoretical properties and experimental observations, for detecting the degree of skewness of an input spatial dataset and then deciding which partitioning technique to apply in order to improve as much as possible the performance of subsequent operations. Experiments on both synthetic and real datasets are presented to confirm the effectiveness of the proposed approach.

Palabras claves

 Artículos similares

       
 
Yongyao Jiang and Chaowei Yang    
With recent advancements, large language models (LLMs) such as ChatGPT and Bard have shown the potential to disrupt many industries, from customer service to healthcare. Traditionally, humans interact with geospatial data through software (e.g., ArcGIS 1... ver más

 
Lei Zhou, Weiye Xiao, Chen Wang, Haoran Wang     Pág. 143 - 161
Human mobility datasets, such as traffic flow data, reveal the connections between urban spaces. A novel framework is proposed to explore the spatial association between urban commercial and residential spaces via consumption travel flows in Shanghai. A ... ver más

 
Eliseo Clementini and Anthony G. Cohn    
RCC*-9 is a mereotopological qualitative spatial calculus for simple lines and regions. RCC*-9 can be easily expressed in other existing models for topological relations and thus can be viewed as a candidate for being a ?bridge? model among various appro... ver más

 
Thiago dos Santos Gonçalves, Harald Klammler and Luíz Rogério Bastos Leal    
Aquifer properties, such as hydraulic transmissivity T and its spatial variability, are fundamental for sustainable groundwater exploitation in arid regions. Especially in karst aquifers, spatial variability can be considerable, and the application of ge... ver más
Revista: Water

 
Daxue Kan, Wenqing Yao, Lianju Lyu and Weichiao Huang    
This study aims to improve the level of water ecological civilization (WEC) in the urbanization process based on the data of prefecture-level cities in Jiangxi, China, from 2011 to 2020. This paper applies spatial analysis methods such as the natural fra... ver más
Revista: Water