Resumen
Malware are developed for various types of malicious attacks, e.g., to gain access to a user?s private information or control of the computer system. The identification and classification of malware has been extensively studied in academic societies and many companies. Beyond the traditional research areas in this field, including malware detection, malware propagation analysis, and malware family clustering, this paper focuses on identifying the ?author group? of a given malware as a means of effective detection and prevention of further malware threats, along with providing evidence for proper legal action. Our framework consists of a malware-feature bipartite graph construction, malware embedding based on DeepWalk, and classification of the target malware based on the k-nearest neighbors (KNN) classification. However, our KNN classifier often faced ambiguous cases, where it should say ?I don?t know? rather than attempting to predict something with a high risk of misclassification. Therefore, our framework allows human experts to intervene in the process of classification for the final decision. We also developed a graphical user interface that provides the points of ambiguity for helping human experts to effectively determine the author group of the target malware. We demonstrated the effectiveness of our human-in-the-loop classification framework via extensive experiments using real-world malware data.