Resumen
Today there is a significant increase in the number of payments made using plastic cards. Banking organizations are forced to analyze huge amounts of data in their work, and therefore software products for processing big data are being put into operation. Apache Spark is a popular tool for streaming and batch data processing. To analyze information, software is often used to run processes on a schedule. An example of such software is Apache Airflow. It is customary to represent the processes launched in Airflow in the form of a directed graph without cycles, while the vertices of the graph are tasks that are the implementation of a certain Airflow operator. The purpose of the article is to develop an Apache Airflow operator to run Spark tasks on a server other than the one on which Airflow operates. When developing such an operator, the task of authentication on a remote server arises. The article proposes a solution to the authentication problem by using access tokens provided by the Vault secret repository, as well as a web service of its own development, with the help of which the secrets and tokens stored in the Vault are managed. As a result of the work, a solution was obtained that allows you to run Spark tasks in Airflow on a remote server, while the secrets used for authentication are stored in Vault, which increases the security of the system.