Resumen
An intrusion detection system (IDS) is an important tool to prevent potential threats to systems and data. Anomaly-based IDSs may deploy machine learning algorithms to classify events either as normal or anomalous and trigger the adequate response. When using supervised learning, these algorithms require classified, rich, and recent datasets. Thus, to foster the performance of these machine learning models, datasets can be generated from different sources in a collaborative approach, and trained with multiple algorithms. This paper proposes a vote-based architecture to generate classified datasets and improve the performance of supervised learning-based IDSs. On a regular basis, multiple IDSs in different locations send their logs to a central system that combines and classifies them using different machine learning models and a majority vote system. Then, it generates a new and classified dataset, which is trained to obtain the best updated model to be integrated into the IDS of the companies involved. The proposed architecture trains multiple times with several algorithms. To shorten the overall runtimes, the proposed architecture was deployed in Fed4FIRE+ with Ray to distribute the tasks by the available resources. A set of machine learning algorithms and the proposed architecture were assessed. When compared with a baseline scenario, the proposed architecture enabled to increase the accuracy by 11.5% and the precision by 11.2%.