From Word Embeddings to Pre-Trained Language Models: A State-of-the-Art Walkthrough

Mourad Mars

Resumen

With the recent advances in deep learning, different approaches to improving pre-trained language models (PLMs) have been proposed. PLMs have advanced state-of-the-art (SOTA) performance on various natural language processing (NLP) tasks such as machine translation, text classification, question answering, text summarization, information retrieval, recommendation systems, named entity recognition, etc. In this paper, we provide a comprehensive review of prior embedding models as well as current breakthroughs in the field of PLMs. Then, we analyse and contrast the various models and provide an analysis of the way they have been built (number of parameters, compression techniques, etc.). Finally, we discuss the major issues and future directions for each of the main points.

Palabras claves

artificial intelligence - NLP - pre-trained language model

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 12 Parte: 17 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences
Information
Applied System Innovation

DOI

https://doi.org/10.3390/app12178805

Artículos similares

Elevating Academic Advising: Natural Language Processing of Student Reviews

Acceso

Omiros Iatrellis, Nicholas Samaras, Konstantinos Kokkinos and Apostolis Xenakis

Academic advising is often pivotal in shaping students? educational experiences and choices. This study leverages natural language processing to quantitatively evaluate reviews of academic advisors, aiming to provide actionable insights on key feedback p... ver más

Revista: Applied System Innovation

Governors in the Digital Era: Analyzing and Predicting Social Media Engagement Using Machine Learning during the COVID-19 Pandemic in Japan

Acceso

Salama Shady, Vera Paola Shoda and Takashi Kamihigashi

This paper presents a comprehensive analysis of the social media posts of prefectural governors in Japan during the COVID-19 pandemic. It investigates the correlation between social media activity levels, governors? characteristics, and engagement metric... ver más

Revista: Informatics

Dictionary Encoding Based on Tagged Sentential Decision Diagrams

Acceso

Deyuan Zhong, Liangda Fang and Quanlong Guan

Encoding a dictionary into another representation means that all the words can be stored in the dictionary in a more efficient way. In this way, we can complete common operations in dictionaries, such as (1) searching for a word in the dictionary, (2) ad... ver más

Revista: Algorithms

Toward Effective Aircraft Call Sign Detection Using Fuzzy String-Matching between ASR and ADS-B Data

Acceso

Mohammed Saïd Kasttet, Abdelouahid Lyhyaoui, Douae Zbakh, Adil Aramja and Abderazzek Kachkari

Recently, artificial intelligence and data science have witnessed dramatic progress and rapid growth, especially Automatic Speech Recognition (ASR) technology based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs). Consequently, new end-to-... ver más

Revista: Aerospace

Tibetan Sentence Boundaries Automatic Disambiguation Based on Bidirectional Encoder Representations from Transformers on Byte Pair Encoding Word Cutting Method

Acceso

Fenfang Li, Zhengzhang Zhao, Li Wang and Han Deng

Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and stat... ver más

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas