Inicio  /  Informatics  /  Vol: 4 Par: 3 (2017)  /  Artículo
ARTÍCULO
TITULO

Big Data Management with Incremental K-Means Trees?GPU-Accelerated Construction and Visualization

Jun Wang    
Alla Zelenyuk    
Dan Imre and Klaus Mueller    

Resumen

While big data is revolutionizing scientific research, the tasks of data management and analytics are becoming more challenging than ever. One way to remit the difficulty is to obtain the multilevel hierarchy embedded in the data. Knowing the hierarchy enables not only the revelation of the nature of the data, it is also often the first step in big data analytics. However, current algorithms for learning the hierarchy are typically not scalable to large volumes of data with high dimensionality. To tackle this challenge, in this paper, we propose a new scalable approach for constructing the tree structure from data. Our method builds the tree in a bottom-up manner, with adapted incremental k-means. By referencing the distribution of point distances, one can flexibly control the height of the tree and the branching of each node. Dimension reduction is also conducted as a pre-process, to further boost the computing efficiency. The algorithm takes a parallel design and is implemented with CUDA (Compute Unified Device Architecture), so that it can be efficiently applied to big data. We test the algorithm with two real-world datasets, and the results are visualized with extended circular dendrograms and other visualization techniques.

 Artículos similares

       
 
Alessandro Pinheiro, Abílio Oliveira, Bráulio Alturas and Mónica Cruz    
The gaming industry has seen a considerable expansion thanks to the ever-increasing and widespread consumption of digital games in different contexts of use and across all age groups. We are witnessing a commercial boom and awakening the attention of res... ver más
Revista: Information

 
Yussuf Ahmed, Muhammad Ajmal Azad and Taufiq Asyhari    
In recent years, there has been a notable surge in both the complexity and volume of targeted cyber attacks, largely due to heightened vulnerabilities in widely adopted technologies. The Prediction and detection of early attacks are vital to mitigating p... ver más
Revista: Information

 
Nikolaos T. Giannakopoulos, Marina C. Terzi, Damianos P. Sakas, Nikos Kanellos, Kanellos S. Toudas and Stavros P. Migkos    
Agriculture firms face an array of struggles, most of which are financial; thus, the role of decision making is discerned as highly important. The agroeconomic indexes (AEIs) of Agriculture Employment Rate (AER), Chemical Product Price Index (CPPI), Farm... ver más
Revista: Information

 
Xianrong Zheng, Elizabeth Gildea, Sheng Chai, Tongxiao Zhang and Shuxi Wang    
Data science has become increasingly popular due to emerging technologies, including generative AI, big data, deep learning, etc. It can provide insights from data that are hard to determine from a human perspective. Data science in finance helps to prov... ver más
Revista: AI

 
Lei Zhou, Weiye Xiao, Chen Wang, Haoran Wang     Pág. 143 - 161
Human mobility datasets, such as traffic flow data, reveal the connections between urban spaces. A novel framework is proposed to explore the spatial association between urban commercial and residential spaces via consumption travel flows in Shanghai. A ... ver más