Resumen
Visualization of multidimensional data is the most important stage of data research. Often, decisions on the further stages of the study are made from the flat view of the data based on "rough proportions". High visibility and persuasiveness of representation on the plane of multidimensional vectors with the preservation of distances is used in models of distributive semantics (Word2Vec, GloVe, NaVec) successfully. On the other hand, the inaccuracy of the two-dimensional projection can lead to time being spent searching for non-existent multidimensional structures. The author set the task to evaluate the accuracy of dimensionality reduction methods with the following limitations: multi-dimensionality arises as a result of vector representation of text documents, dimensionality reduction is aimed at visualization on the plane. In numerous methods of dimension reduction, there is no separate class of approaches specifically for visualization. To measure the accuracy, an approach was chosen using marked-up data and quantifying the preservation of the markup while reducing the dimension. The author investigated 12 methods of reducing the dimension on two labeled data sets in Russian and English. Using the Silhouette Coefficient metric, the most accurate visualization method for text data was determined as UMAP with the Hellinger distance as the metric.