Redirigiendo al acceso original de articulo en 24 segundos...
ARTÍCULO
TITULO

Improving Data Collection on Article Clustering by Using Distributed Focused Crawler

Dani Gunawan    
Amalia Amalia    
Atras Najwan    

Resumen

Collecting or harvesting data from the Internet is often done by using web crawler. General web crawler is developed to be more focus on certain topic. The type of this web crawler called focused crawler. To improve the datacollection performance, creating focused crawler is not enough as the focused crawler makes efficient usage of network bandwidth and storage capacity. This research proposes a distributed focused crawler in order to improve the web crawler performance which also efficient in network bandwidth and storage capacity. This distributed focused crawler implements crawling scheduling, site ordering to determine URL queue, and focused crawler by using Naïve Bayes. This research also tests the web crawling performance by conducting multithreaded, then observe the CPU and memory utilization. The conclusion is the web crawling performance will be decrease when too many threads are used. As the consequences, the CPU and memory utilization will be very high, meanwhile performance of the distributed focused crawler will be low.

 Artículos similares

       
 
Julia Mayer, Martin Memmel, Johannes Ruf, Dhruv Patel, Lena Hoff and Sascha Henninger    
Urban tree cadastres, crucial for climate adaptation and urban planning, face challenges in maintaining accuracy and completeness. A transdisciplinary approach in Kaiserslautern, Germany, complements existing incomplete tree data with additional precise ... ver más
Revista: Applied Sciences

 
Pengfei Ning, Dianjun Zhang, Xuefeng Zhang, Jianhui Zhang, Yulong Liu, Xiaoyi Jiang and Yansheng Zhang    
The Array for Real-time Geostrophic Oceanography (Argo) program provides valuable data for maritime research and rescue operations. This paper is based on Argo historical and satellite observations, and inverted sea surface and submarine drift trajectori... ver más

 
Chunling Wang, Tianyi Hang, Changke Zhu and Qi Zhang    
The Czech Republic is one of the countries along the Belt and Road Initiative, and classifying land cover in the Czech Republic helps to understand the distribution of its forest resources, laying the foundation for forestry cooperation between China and... ver más
Revista: Applied Sciences

 
Pengyun Chen, Zhiru Li, Guangqing Liu, Ziyi Wang, Jiayu Chen, Shangyao Shi, Jian Shen and Lizhou Li    
The positioning results of terrain matching in flat terrain areas will significantly deteriorate due to the influence of terrain nonlinearity and multibeam measurement noise. To tackle this problem, this study presents the Pulse-Coupled Neural Network (P... ver más

 
MohammadHossein Reshadi, Wen Li, Wenjie Xu, Precious Omashor, Albert Dinh, Scott Dick, Yuntong She and Michael Lipsett    
Anomaly detection in data streams (and particularly time series) is today a vitally important task. Machine learning algorithms are a common design for achieving this goal. In particular, deep learning has, in the last decade, proven to be substantially ... ver más
Revista: Algorithms