Resumen
Smart home environments, which consist of various Internet of Things (IoT) devices to support and improve our daily lives, are expected to be widely adopted in the near future. Owing to a lack of awareness regarding the risks associated with IoT devices and challenges in replacing or the updating their firmware, adequate security measures have not been implemented. Instead, IoT device identification methods based on traffic analysis have been proposed. Since conventional methods process and analyze traffic data simultaneously, bias in the occurrence rate of traffic patterns has a negative impact on the analysis results. Therefore, this paper proposes an IoT traffic analysis and device identification method based on two-stage clustering in smart home environments. In the first step, traffic patterns are extracted by clustering IoT traffic at a local gateway located in each smart home and subsequently sent to a cloud server. In the second step, the cloud server extracts common traffic units to represent IoT traffic by clustering the patterns obtained in the first step. Two-stage clustering can reduce the impact of data bias, because each cluster extracted in the first clustering is summarized as one value and used as a single data point in the second clustering, regardless of the occurrence rate of traffic patterns. Through the proposed two-stage clustering method, IoT traffic is transformed into time series vector data that consist of common unit patterns and can be identified based on time series representations. Experiments using public IoT traffic datasets indicated that the proposed method could identify 21 IoTs devices with an accuracy of 86.9%. Therefore, we can conclude that traffic analysis using two-stage clustering is effective for improving the clustering quality, device identification, and implementation in distributed environments.