Resumen
Unsupervised anomaly detection in high-dimensional data is an important subject of research in theoretical machine learning and applied areas. One of important applications is anomaly detection in network traffic data, which can be useful for preventing network security violations.Unsupervised anomaly detection is based on density estimation, which is problematic in high-dimensional data. To deal with the issue dimensionality, reduction is performed first, and then the density is estimated in a space of smaller dimension. Recently deep learning methods have been widely used in high-dimensional anomaly detection. One of such methods is the Deep Autoencoding Gaussian Mixture Model (DAGMM). DAGMM is a combination of a deep autoencoder, which performs dimensionality reduction and reconstruction error estimation, and a Gaussian mixture model, which predicts if a data sample is anomalous. We apply DAGMM to unsupervised anomaly detection in network traffic data. Testing anomaly detection system on network data presents a problem of lack of a generally accepted benchmark dataset, which would be recent, contain different types of attacks and have labels. We chose to use the UNSW-NB15 dataset, which satisfies these requirements and has been suggested as an up-to-date benchmark.A correction to the algorithm, which improves anomaly detection accuracy is proposed.