Resumen
Network security encloses a wide set of technologies dealing with intrusions detection. Despite the massive adoption of signature-based network intrusion detection systems (IDSs), they fail in detecting zero-day attacks and previously unseen vulnerabilities exploits. Behaviour-based network IDSs have been seen as a way to overcome signature-based IDS flaws, namely through the implementation of machine-learning-based methods, to tolerate new forms of normal network behaviour, and to identify yet unknown malicious activities. A wide set of machine learning methods has been applied to implement behaviour-based IDSs with promising results on detecting new forms of intrusions and attacks. Innovative machine learning techniques have emerged, namely deep-learning-based techniques, to process unstructured data, speed up the classification process, and improve the overall performance obtained by behaviour-based network intrusion detection systems. The use of realistic datasets of normal and malicious networking activities is crucial to benchmark machine learning models, as they should represent real-world networking scenarios and be based on realistic computers network activity. This paper aims to evaluate CSE-CIC-IDS2018 dataset and benchmark a set of deep-learning-based methods, namely convolutional neural networks (CNN) and long short-term memory (LSTM). Autoencoder and principal component analysis (PCA) methods were also applied to evaluate features reduction in the original dataset and its implications in the overall detection performance. The results revealed the appropriateness of using the CSE-CIC-IDS2018 dataset to benchmark supervised deep learning models. It was also possible to evaluate the robustness of using CNN and LSTM methods to detect unseen normal activity and variations of previously trained attacks. The results reveal that feature reduction methods decreased the processing time without loss of accuracy in the overall detection performance.