ARTÍCULO
TITULO

A Model for Enhancing Unstructured Big Data Warehouse Execution Time

Marwa Salah Farhan    
Amira Youssef and Laila Abdelhamid    

Resumen

Traditional data warehouses (DWs) have played a key role in business intelligence and decision support systems. However, the rapid growth of the data generated by the current applications requires new data warehousing systems. In big data, it is important to adapt the existing warehouse systems to overcome new issues and limitations. The main drawbacks of traditional Extract?Transform?Load (ETL) are that a huge amount of data cannot be processed over ETL and that the execution time is very high when the data are unstructured. This paper focuses on a new model consisting of four layers: Extract?Clean?Load?Transform (ECLT), designed for processing unstructured big data, with specific emphasis on text. The model aims to reduce execution time through experimental procedures. ECLT is applied and tested using Spark, which is a framework employed in Python. Finally, this paper compares the execution time of ECLT with different models by applying two datasets. Experimental results showed that for a data size of 1 TB, the execution time of ECLT is 41.8 s. When the data size increases to 1 million articles, the execution time is 119.6 s. These findings demonstrate that ECLT outperforms ETL, ELT, DELT, ELTL, and ELTA in terms of execution time.

Palabras claves

 Artículos similares

       
 
Chen Zhang, Celimuge Wu, Min Lin, Yangfei Lin and William Liu    
In the advanced 5G and beyond networks, multi-access edge computing (MEC) is increasingly recognized as a promising technology, offering the dual advantages of reducing energy utilization in cloud data centers while catering to the demands for reliabilit... ver más
Revista: Future Internet

 
Binita Kusum Dhamala, Babu R. Dawadi, Pietro Manzoni and Baikuntha Kumar Acharya    
Graph representation is recognized as an efficient method for modeling networks, precisely illustrating intricate, dynamic interactions within various entities of networks by representing entities as nodes and their relationships as edges. Leveraging the... ver más
Revista: Future Internet

 
Pradeep Kumar, Guo-Liang Shih, Bo-Lin Guo, Siva Kumar Nagi, Yibeltal Chanie Manie, Cheng-Kai Yao, Michael Augustine Arockiyadoss and Peng-Chun Peng    
Violent attacks have been one of the hot issues in recent years. In the presence of closed-circuit televisions (CCTVs) in smart cities, there is an emerging challenge in apprehending criminals, leading to a need for innovative solutions. In this paper, t... ver más
Revista: Future Internet

 
Abdullah F. Al-Aboosi, Aldo Jonathan Muñoz Vazquez, Fadhil Y. Al-Aboosi, Mahmoud El-Halwagi and Wei Zhan    
Accurate prediction of renewable energy output is essential for integrating sustainable energy sources into the grid, facilitating a transition towards a more resilient energy infrastructure. Novel applications of machine learning and artificial intellig... ver más

 
Mo Fan, Massoomeh Hedayati Marzbali, Aldrin Abdullah and Mohammad Javad Maghsoodi Tilaki    
Contemporary urban development places a critical emphasis on pedestrian environments, especially in historic cities like George Town, which is a UNESCO World Heritage Site in Malaysia. Although survey questionnaires effectively captured public perception... ver más
Revista: Urban Science