On the Role of Prepositional Statistics for Genre Identification of Russian texts

O. A. Mitrofanova

A. D. Moskvina

Resumen

In this work we investigate the role of statistical data on function words for automatic identification of genre and topical characteristics of Russian texts. We use the ratio of semantically related prepositions as the principal linguistic parameter. We consider seven frequent prepositions which have spatial meaning and also reveal one or more figurative meanings: ??? (under) / ??? (over), ? (in) / ?? (from), ? (to) / ?? (from), ?? (behind) / ????? (in front of), ? (in) / ?? (at), ?? (at) / ? (from). Our research hypothesis claims that coefficients of preposition frequency ratios in the above mentioned pairs may indicate stylistic properties of the texts. We based our research on several corpora representing different genres and topics: general, literary, publicistic, non-literary, oral subcorpora of the Russian National Corpus (RNC), Russian corpora from the Aranea superlarge corpora family, namely, Araneum Russicum Russicum and Araneum Russicum Externum corpora, as well as social media corpus including posts and comments from Facebook and Twitter networks, and literary corpus including texts from Librusec digital library. We verified the hypothesis on the stylistic homogeneity of oral and written speech of social media users, our verification was based on statistical analysis of polysemous prepositions. Experiments proved the significance of ??? (under) / ??? (over) coefficient in style and text type detection, and revealed the role of ? (in) / ?? (from) and ?? (behind) / ????? (in front of) in differentiation of written and oral texts. We obtained evidence on the statistics of preposition occurrence, as well as the information on the semantic content of prepositional phrases, which is of great significance for text style, genre and topic detection. We found out and analyzed the main properties of the use of polysemous prepositions.

Acceso

PÁGINAS

pp. 91 - 96

NÚMERO

Volumen: 8 Número: 11 Parte: 0 (2020)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Information
Applied Sciences
International Journal of Open Information Technologies

Artículos similares

RusNeuroPsych: Open Corpus for Study Relations between Author Demographic, Personality Traits, Lateral Preferences and Affect in Text

Acceso

Tatiana Litvinova,Ekarerina Ryzhkova Pág. 32 - 36

A text reflects a range of combinations of individual inter-acting characteristics of its author, both stable (gender, psychological traits, neuropsychological characteristics) and variable (feelings, emotions). It is obvious that it is not in isolation ... ver más

Revista: International Journal of Open Information Technologies

Development of Corpus-Based Tatar-Russian Socio-Political Dictionary of Collocations

Acceso

A. Galieva,O. Nevzorova Pág. 85 - 93

This paper discusses main sources and methodology of compiling the Tatar-Russian Socio-Political Dictionary of collocations. The area of collocations within the language system is of particular importance, and the well-known language-specificity of collo... ver más

Revista: International Journal of Open Information Technologies

Russian text corpora for deception detection studies

Acceso

T. A. Litvinova,O. V. Zagorovskaya,O. A. Litvinova Pág. 58 - 63

Text-based deception detection is presently on the way to gain even more significance as related studies certainly have both theoretical and practical value and a range of applications for police, security, and customs, as well as predatory communication... ver más

Revista: International Journal of Open Information Technologies

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas