ARTÍCULO
TITULO

A Comprehensive Spark-Based Layer for Converting Relational Databases to NoSQL

Manal A. Abdel-Fattah    
Wael Mohamed and Sayed Abdelgaber    

Resumen

Currently, the continuous massive growth in the size, variety, and velocity of data is defined as big data. Relational databases have a limited ability to work with big data. Consequently, not only structured query language (NoSQL) databases were utilized to handle big data because NoSQL represents data in diverse models and uses a variety of query languages, unlike traditional relational databases. Therefore, using NoSQL has become essential, and many studies have attempted to propose different layers to convert relational databases to NoSQL; however, most of them targeted only one or two models of NoSQL, and evaluated their layers on a single node, not in a distributed environment. This study proposes a Spark-based layer for mapping relational databases to NoSQL models, focusing on the document, column, and key?value databases of NoSQL models. The proposed Spark-based layer comprises of two parts. The first part is concerned with converting relational databases to document, column, and key?value databases, and encompasses two phases: a metadata analyzer of relational databases and Spark-based transformation and migration. The second part focuses on executing a structured query language (SQL) on the NoSQL. The suggested layer was applied and compared with Unity, as it has similar components and features and supports sub-queries and join operations in a single-node environment. The experimental results show that the proposed layer outperformed Unity in terms of the query execution time by a factor of three. In addition, the proposed layer was applied to multi-node clusters using different scenarios, and the results show that the integration between the Spark cluster and NoSQL databases on multi-node clusters provided better performance in reading and writing while increasing the dataset size than using a single node.

 Artículos similares

       
 
Sedick Baker Effendi, Brink van der Merwe and Wolf-Tilo Balke    
Every day large quantities of spatio-temporal data are captured, whether by Web-based companies for social data mining or by other industries for a variety of applications ranging from disaster relief to marine data analysis. Making sense of all this dat... ver más
Revista: Future Internet

 
Dongming Guo and Erling Onstein    
Geospatial information has been indispensable for many application fields, including traffic planning, urban planning, and energy management. Geospatial data are mainly stored in relational databases that have been developed over several decades, and mos... ver más

 
Florent Poux, Roland Billen, Jean-Paul Kasprzyk, Pierre-Henri Lefebvre and Pierre Hallot    
The digital management of an archaeological site requires to store, organise, access and represent all the information that is collected on the field. Heritage building information modelling, archaeological or heritage information systems now tend to pro... ver más

 
Miguel Diogo, Bruno Cabral and Jorge Bernardino    
Internet has become so widespread that most popular websites are accessed by hundreds of millions of people on a daily basis. Monolithic architectures, which were frequently used in the past, were mostly composed of traditional relational database manage... ver más
Revista: Future Internet

 
Antonio Celesti, Maria Fazio and Massimo Villari    
Presently, we are observing an explosion of data that need to be stored and processed over the Internet, and characterized by large volume, velocity and variety. For this reason, software developers have begun to look at NoSQL solutions for data storage.... ver más
Revista: Future Internet