REVISTA
Applied Sciences

TODAS

Redirigiendo al acceso original de articulo en 23 segundos...

Inicio / Applied Sciences / Vol: 11 Par: 24 (2021) / Artículo

ARTÍCULO

TITULO

Columns Occurrences Graph to Improve Column Prediction in Deep Learning Nlidb

Shanza Abbas

Muhammad Umair Khan

Scott Uk-Jin Lee and Asad Abbas

Resumen

Natural language interfaces to databases (NLIDB) has been a research topic for a decade. Significant data collections are available in the form of databases. To utilize them for research purposes, a system that can translate a natural language query into a structured one can make a huge difference. Efforts toward such systems have been made with pipelining methods for more than a decade. Natural language processing techniques integrated with data science methods are researched as pipelining NLIDB systems. With significant advancements in machine learning and natural language processing, NLIDB with deep learning has emerged as a new research trend in this area. Deep learning has shown potential for rapid growth and improvement in text-to-SQL tasks. In deep learning NLIDB, closing the semantic gap in predicting users? intended columns has arisen as one of the critical and fundamental problems in this research field. Contributions toward this issue have consisted of preprocessed feature inputs and encoding schema elements afore of and more impactful to the targeted model. Various significant work contributed towards this problem notwithstanding, this has been shown to be one of the critical issues for the task of developing NLIDB. Working towards closing the semantic gap between user intention and predicted columns, we present an approach for deep learning text-to-SQL tasks that includes previous columns? occurrences scores as an additional input feature. Overall exact match accuracy can also be improved by emphasizing the improvement of columns? prediction accuracy, which depends significantly on column prediction itself. For this purpose, we extract the query fragments from previous queries? data and obtain the columns? occurrences and co-occurrences scores. Column occurrences and co-occurrences scores are processed as input features for the encoder?decoder-based text to the SQL model. These scores contribute, as a factor, the probability of having already used columns and tables together in the query history. We experimented with our approach on the currently popular text-to-SQL dataset Spider. Spider is a complex data set containing multiple databases. This dataset includes query?question pairs along with schema information. We compared our exact match accuracy performance with a base model using their test and training data splits. It outperformed the base model?s accuracy, and accuracy was further boosted in experiments with the pretrained language model BERT.

Palabras claves

deep learning - text-to-SQL - natural language processing - database - machine learning - machine translation

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 11 Parte: 24 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Acta Scientiarum: Technology
Applied Sciences
Informatics

DOI

https://doi.org/10.3390/app112412116

Artículos similares

Leveraging Semantic Text Analysis to Improve the Performance of Transformer-Based Relation Extraction

Acceso

Marie-Therese Charlotte Evans, Majid Latifi, Mominul Ahsan and Julfikar Haider

Keyword extraction from Knowledge Bases underpins the definition of relevancy in Digital Library search systems. However, it is the pertinent task of Joint Relation Extraction, which populates the Knowledge Bases from which results are retrieved. Recent ... ver más

Revista: Information

A Bibliometric Analysis of Text Mining: Exploring the Use of Natural Language Processing in Social Media Research

Acceso

Andra Sandu, Liviu-Adrian Cotfas, Aurelia Stanescu and Camelia Delcea

Natural language processing (NLP) plays a pivotal role in modern life by enabling computers to comprehend, analyze, and respond to human language meaningfully, thereby offering exciting new opportunities. As social media platforms experience a surge in g... ver más

Revista: Applied Sciences

There Are Infinite Ways to Formulate Code: How to Mitigate the Resulting Problems for Better Software Vulnerability Detection

Acceso

Jinghua Groppe, Sven Groppe, Daniel Senf and Ralf Möller

Given a set of software programs, each being labeled either as vulnerable or benign, deep learning technology can be used to automatically build a software vulnerability detector. A challenge in this context is that there are countless equivalent ways to... ver más

Revista: Information

Offensive Text Span Detection in Romanian Comments Using Large Language Models

Acceso

Andrei Paraschiv, Teodora Andreea Ion and Mihai Dascalu

The advent of online platforms and services has revolutionized communication, enabling users to share opinions and ideas seamlessly. However, this convenience has also brought about a surge in offensive and harmful language across various communication m... ver más

Revista: Information

Elevating Academic Advising: Natural Language Processing of Student Reviews

Acceso

Omiros Iatrellis, Nicholas Samaras, Konstantinos Kokkinos and Apostolis Xenakis

Academic advising is often pivotal in shaping students? educational experiences and choices. This study leverages natural language processing to quantitatively evaluate reviews of academic advisors, aiming to provide actionable insights on key feedback p... ver más

Revista: Applied System Innovation

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas