Resumen
Scientists have long been working on algorithms for translate text written in natural language into speech. But the quality of work these algorithms left much to be desired until the moment when the application of deep learning methods was not possible. With the advent of the necessary computing resources and the accumulation of a sufficient amount of data for training, these methods have become widely used in machine learning in general and, of course, in speech synthesis in particular. A significant improvement in the quality of the work of text-to-speech algorithms has led to their widespread use, namely in mobile devices, smart speakers, voice assistants, etc. But it is worth noting that the algorithms of this class, developed at the moment, do not always correctly cope with the task. For example, they cannot always correctly emphasize or voice the necessary parts of the text with the necessary intonation. Thus, the study of methods and means of synthesizing speech has become even more relevant.There are many different ways to synthesize speech by text, such as parametric synthesis, compilation synthesis, subject-oriented synthesis, and full speech synthesis by the rules. The purpose of this work is to review existing algorithms for translating text to speech and conducting their comparative analysis. The main algorithms were considered: WaveNet, DeepVoice, Tacatron, DeepVoice 2, DeepVoice 3 and Tacatron 2. In the course of their comparison, it was determined that the best at the moment are DeepVoice 3 and Tacatron 2, since the assessments of the quality of their work are closest to professionally recorded speech.