Towards a part-of-speech tagger for Sranan Tongo

Nicolás Cortegoso Vissio

Viktor Zakharov

Resumen

This paper is the continuation of a work submitted to the International Conference Corpus Linguistics 2021 [1]. On that occasion, a rule-based stochastic hybrid part-of-speech tagger (POS) was introduced for Sranan Tongo, a Creole language from South America with around half a million speakers. Since Sranan Tongo does not have a written corpus and text annotation is an expensive and time-consuming task, it was proposed to take a first step in training a POS tagger using only 550 hand-annotated sentences with part of speech tags.In this new contribution, the development of the POS tagger for Sranan Tongo goes a step further with the addition of more training data. For this matter, the tagger was used to annotate 2,406 sentences. The tagging results were hand-corrected and employed to retrain the model. A comparison is shown between the performance of the POS tagger on three texts before and after the inclusion of the new training data.

Acceso

PÁGINAS

pp. 99 - 103

NÚMERO

Volumen: 9 Número: 12 Parte: 0 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

International Journal of Open Information Technologies
Journal of Information Systems Engineering and Business Intelligence
Data Science: Journal of Computing and Applied Informatics

Towards a part-of-speech tagger for Sranan Tongo

Artículos similares

Revistas destacadas