Inicio  /  Future Internet  /  Vol: 10 Par: 12 (2018)  /  Artículo
ARTÍCULO
TITULO

A Method for Filtering Pages by Similarity Degree based on Dynamic Programming

Ziyun Deng and Tingqin He    

Resumen

To obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The biggest innovation of MFPSDDP is that it does not need to know the structures of webpages in advance. First, we address the design ideas with queue and double threads. Then, a dynamic programming algorithm for calculating the length of the longest common subsequence and a formula for calculating similarity are proposed. Further, for obtaining detailed information webpages from 200,000 webpages downloaded from the famous website ?www.jd.com?, we choose the same relationship Completely Same Relationship (CSR) and set the similarity threshold to 0.2. The Recall Ratio (RR) of MFPSDDP is in the middle in the four filtering methods compared. When the number of webpages filtered is nearly 200,000, the PR of MFPSDDP is highest in the four filtering methods compared, which can reach 85.1%. The PR of MFPSDDP is 13.3 percentage points higher than the PR of a Method for Filtering Pages by Containing Strings (MFPCS).

 Artículos similares

       
 
Huapeng Tang, Danyang Qin, Jiaqiang Yang, Haoze Bie, Yue Li, Yong Zhu and Lin Ma    
In indoor low-light environments, the lack of light makes the captured images often suffer from quality degradation problems, including missing features in dark areas, noise interference, low brightness, and low contrast. Therefore, the feature extractio... ver más

 
Zhi Cai, Fangzhe Liu, Qiong Qi, Xing Su, Limin Guo and Zhiming Ding    
Urban rail transit is an essential part of the urban public transportation system. The reasonable spatial data visualization of urban rail transit stations can provide a more intuitive way for the majority of travelers to arrange travel plans and find de... ver más

 
Huapeng Tang, Danyang Qin, Jiaqiang Yang, Haoze Bie, Mengying Yan, Gengxin Zhang and Lin Ma    
Frame buildings as important nodes of urban space. The include high-speed railway stations, airports, residences, and office buildings, which carry various activities and functions. Due to illumination irrationality and mutual occlusion between complex o... ver más

 
Qiuling Tang and Wanfeng Dou    
Calculating the least-cost path (LCP) is a fundamental operation in raster-based geographic information systems (GIS). The LCP is applied to raster cost surfaces, in which it determines the most cost-effective path. Increasing the raster resolution resul... ver más

 
Botao Zhang, Yong Feng, Lin Fu, Jinguang Gu and Fangfang Xu    
Entity and relation linking are the core tasks in knowledge base question answering (KBQA). They connect natural language questions with triples in the knowledge base. In most studies, researchers perform these two tasks independently, which ignores the ... ver más