Resumen
The paper analyzes the existing methods for processing big data, which can be applied to the processing of heterogeneous and multi-scale data. In this work, heterogeneous data is understood as any data with high variability of data types, formats and nature of origin. They can be ambiguous and of poor quality due to missing values, high redundancy, or unreliability. As a result, there is a problem of integrating and aggregating this data for further processing or making specific decisions. Of particular interest is the acquisition of knowledge from autonomous, semantically heterogeneous and distributed data sources, query-oriented and approaches to data integration. The lack of integrity of such data is usually associated with invalid data and incomplete data. Data consistency is the most critical issue in continuous auditing systems for big data and relates to interdependent data between applications and the entire organization. Analyzing large, heterogeneous data can be problematic because it often involves collecting and storing mixed data based on different patterns or rules. The context of the data and their description play an important role here. As a result, the authors consider relevant aspects of data processing, the choice of data processing methods, including data cleansing, data integration, size reduction and normalization for heterogeneous data and the corresponding system and analytical analysis, the potential for fusion of heterogeneous data is considered. This paper describes some of the advantages and disadvantages of the most commonly used methods for processing heterogeneous data. The problems of processing heterogeneous and different-scale data are revealed. The tools for processing big data, some traditional methods of data mining, including machine learning are presented.