Two methods for identifying Russian words in Yakut texts

Nicolas Cortegoso Vissio

Victor Zakharov

Resumen

The article discusses two methods for extracting foreign words from Yakut texts. Foreign words refer to non-integrated lexical units, which have not been adapted to Yakut orthography and are therefore written as in the original language. Based on the fact that most foreign words in Yakut texts come from the Russian language, it is assumed that they have a particular form by which they can be distinguished from the Yakut word forms. The first method reviewed here is based on rules. It implements an algorithm that detects letter combinations that are foreign to the Yakut language. The second method applies a statistical approach to model and differentiate Yakut and Russian letter combinations. The effectiveness of both methods in extracting Russian foreign words is compared with the results of manual highlighting performed by Russian speakers on 6 Yakut texts. This work is a continuation of the article ?Identification of Russian borrowings in Yakut texts?, published in ?Computer Linguistics and Computational Ontologies. Number 5 (Proceedings of the XXIV Joint International Conference "Internet and Modern Society, IMS-2022.

Acceso

PÁGINAS

pp. 26 - 34

NÚMERO

Volumen: 10 Número: 11 Parte: 0 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Water
Inteligencia Artificial
Aerospace

Artículos similares

Investigation of Karst Spring Flow Cessation Using Grey System Models

Acceso

Yaru Guo, Tian-Chyi Jim Yeh and Yonghong Hao

Karst aquifers are prominent sources of water worldwide; they store large amounts of water and are known for their beautiful springs. However, extensive groundwater development and climate variation has resulted in a decline in the flow of most karst spr... ver más

Revista: Water

Improving Image Retrieval using a Data mining Approach

Acceso

Houaria ABED, Lynda ZAOUI Pág. 97 - 113

Recent years have witnessed great interest in developing methods for content-based image retrieval (CBIR). Generally, the image search results which are returned by an image search engine contain multiple topics, and organizing the results into different... ver más

Revista: Inteligencia Artificial

Comparative Study of Clustering Algorithms using OverallSimSUX Similarity Function for XML Documents

Acceso

Damny Magdaleno Guevara, Yadriel Miranda, Ivett Fuentes, María Garc ía Pág. 69 - 80

A huge amount of information is represented in XML format. Several tools have been developed to store, and query XML data. It becomes inevitable to develop high performance techniques for efficiently analysing extremely large collections of XML data. O... ver más

Revista: Inteligencia Artificial

Integrated Guidance and Control for Collision Course Stabilization of Dual-Controlled Interceptors

Acceso

Hyeong-Geun Kim and Donghyun Beck

We propose an integrated guidance and control law for dual-controlled interceptor dynamics controlled via tail-fin deflection and reaction jets. Because dual-controlled interceptors have two input channels in each axis, we define two error variables as t... ver más

Revista: Aerospace

Component Recognition and Coordinate Extraction in Two-Dimensional Paper Drawings Using SegFormer

Acceso

Shengkun Gu and Dejiang Wang

Within the domain of architectural urban informatization, the automated precision recognition of two-dimensional paper schematics emerges as a pivotal technical challenge. Recognition methods traditionally employed frequently encounter limitations due to... ver más

Revista: Information

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas