Algorithms for Table Structure Recognition

Yosveni Escalona

Resumen

Tables are widely adopted to organize and publish data. For example, the Web has an enormous number of tables, published in HTML, embedded in PDF documents, or that can be simply downloaded from Web pages. However, tables are not always easy to interpret due to the variety of features and formats used. Indeed, a large number of methods and tools have been developed to interpreted tables. This work presents the implementation of an algorithm, based on Conditional Random Fields (CRFs), to classify the rows of a table as header rows, data rows or metadata rows. The implementation is complemented by two algorithms for table recognition in a spreadsheet document, respectively based on rules and on region detection. Finally, the work describes the results and the benefits obtained by applying the implemented algorithm to HTML tables, obtained from the Web, and to spreadsheet tables, downloaded from the Brazilian National Petroleum Agency.

Acceso

PÁGINAS

NÚMERO

Número: 25 Parte: 0 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Information
Algorithms
International Journal of Open Information Technologies

Artículos similares

Multi-Target Rough Sets and Their Approximation Computation with Dynamic Target Sets

Acceso

Wenbin Zheng, Jinjin Li and Shujiao Liao

Multi-label learning has become a hot topic in recent years, attracting scholars? attention, including applying the rough set model in multi-label learning. Exciting works that apply the rough set model into multi-label learning usually adapt the rough s... ver más

Revista: Information

Review on Compressive Sensing Algorithms for ECG Signal for IoT Based Deep Learning Framework

Acceso

Subramanyam Shashi Kumar and Prakash Ramachandran

Nowadays, healthcare is becoming very modern, and the support of Internet of Things (IoT) is inevitable in a personal healthcare system. A typical personal healthcare system acquires vital parameters from human users and stores them in a cloud platform f... ver más

Revista: Applied Sciences

Fast Conflict Detection for Multi-Dimensional Packet Filters

Acceso

Chun-Liang Lee, Guan-Yu Lin and Yaw-Chung Chen

To support advanced network services, Internet routers must perform packet classification based on a set of rules called packet filters. If two or more filters overlap, a filter conflict will occur and lead to ambiguity in packet classification. Further,... ver más

Revista: Algorithms

Hybrid Optimized Fuzzy Pitch Controller of a Floating Wind Turbine with Fatigue Analysis

Acceso

Carlos Serrano, Jesus-Enrique Sierra-Garcia and Matilde Santos

Floating offshore wind turbines (FOWTs) are systems with complex and highly nonlinear dynamics; they are subjected to heavy loads, making control with classical strategies a challenge. In addition, they experience vibrations due to wind and waves. Furthe... ver más

Revista: Journal of Marine Science and Engineering

SpaceDrones 2.0?Hardware-in-the-Loop Simulation and Validation for Orbital and Deep Space Computer Vision and Machine Learning Tasking Using Free-Flying Drone Platforms

Acceso

Marco Peterson, Minzhen Du, Bryant Springle and Jonathan Black

The proliferation of reusable space vehicles has fundamentally changed how assets are injected into the low earth orbit and beyond, increasing both the reliability and frequency of launches. Consequently, it has led to the rapid development and adoption ... ver más

Revista: Aerospace

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas