Resumen
Network traffic analysis, and specifically anomaly and attack detection, call for sophisticated tools relying on a large number of features. Mathematical modeling is extremely difficult, given the ample variety of traffic patterns and the subtle and varied ways that malicious activity can be carried out in a network. We address this problem by exploiting data-driven modeling and computational intelligence techniques. Sequences of packets captured on the communication medium are considered, along with multi-label metadata. Graph-based modeling of the data are introduced, thus resorting to the powerful GRALG approach based on feature information granulation, identification of a representative alphabet, embedding and genetic optimization. The obtained classifier is evaluated both under accuracy and complexity for two different supervised problems and compared with state-of-the-art algorithms. We show that the proposed preprocessing strategy is able to describe higher level relations between data instances in the input domain, thus allowing the algorithms to suitably reconstruct the structure of the input domain itself. Furthermore, the considered Granular Computing approach is able to extract knowledge on multiple semantic levels, thus effectively describing anomalies as subgraphs-based symbols of the whole network graph, in a specific time interval. Interesting performances can thus be achieved in identifying network traffic patterns, in spite of the complexity of the considered traffic classes.