REVISTA
Applied Sciences

TODAS

Redirigiendo al acceso original de articulo en 17 segundos...

Inicio / Applied Sciences / Vol: 12 Par: 20 (2022) / Artículo

ARTÍCULO

TITULO

The Multi-Hot Representation-Based Language Model to Maintain Morpheme Units

Ju-Sang Lee

Joon-Choul Shin and Choel-Young Ock

Resumen

Natural language models brought rapid developments to Natural Language Processing (NLP) performance following the emergence of large-scale deep learning models. Language models have previously used token units to represent natural language while reducing the proportion of unknown tokens. However, tokenization in language models raises language-specific issues. One of the key issues is that separating words by morphemes may cause distortion to the original meaning; also, it can prove challenging to apply the information surrounding a word, such as its semantic network. We propose a multi-hot representation language model to maintain Korean morpheme units. This method represents a single morpheme as a group of syllable-based tokens for cases where no matching tokens exist. This model has demonstrated similar performance to existing models in various natural language processing applications. The proposed model retains the minimum unit of meaning by maintaining the morpheme units and can easily accommodate the extension of semantic information.

Palabras claves

language model - tokenization - multi-hot representation - maintain morpheme units - morpheme and syllable-base tokens

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 12 Parte: 20 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences
Informatics
Algorithms

DOI

https://doi.org/10.3390/app122010612

Artículos similares

Offensive Text Span Detection in Romanian Comments Using Large Language Models

Acceso

Andrei Paraschiv, Teodora Andreea Ion and Mihai Dascalu

The advent of online platforms and services has revolutionized communication, enabling users to share opinions and ideas seamlessly. However, this convenience has also brought about a surge in offensive and harmful language across various communication m... ver más

Revista: Information

Optimizing Reinforcement Learning Using a Generative Action-Translator Transformer

Acceso

Jiaming Li, Ning Xie and Tingting Zhao

In recent years, with the rapid advancements in Natural Language Processing (NLP) technologies, large models have become widespread. Traditional reinforcement learning algorithms have also started experimenting with language models to optimize training. ... ver más

Revista: Algorithms

Tibetan Sentence Boundaries Automatic Disambiguation Based on Bidirectional Encoder Representations from Transformers on Byte Pair Encoding Word Cutting Method

Acceso

Fenfang Li, Zhengzhang Zhao, Li Wang and Han Deng

Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and stat... ver más

Revista: Applied Sciences

CTGGAN: Controllable Text Generation with Generative Adversarial Network

Acceso

Zhe Yang, Yi Huang, Yaqin Chen, Xiaoting Wu, Junlan Feng and Chao Deng

Controllable Text Generation (CTG) aims to modify the output of a Language Model (LM) to meet specific constraints. For example, in a customer service conversation, responses from the agent should ideally be soothing and address the user?s dissatisfactio... ver más

Revista: Applied Sciences

Extending Context Window in Large Language Models with Segmented Base Adjustment for Rotary Position Embeddings

Acceso

Rongsheng Li, Jin Xu, Zhixiong Cao, Hai-Tao Zheng and Hong-Gee Kim

In the realm of large language models (LLMs), extending the context window for long text processing is crucial for enhancing performance. This paper introduces SBA-RoPE (Segmented Base Adjustment for Rotary Position Embeddings), a novel approach designed... ver más

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas