Resumen
To proactively defend computer systems against cyber-attacks, a honeypot system?purposely designed to be prone to attacks?is commonly used to detect attacks, discover new vulnerabilities, exploits or malware before they actually do real damage to real systems. Its usefulness lies in being able to operate without being identified as a trap by adversaries; otherwise, its values are significantly reduced. A honeypot is commonly classified by the degree of interactions that they provide to the attacker: low, medium and high-interaction honeypots. However, these systems have some shortcomings of their own. First, the low and medium-interaction honeypots can be easily detected due to their limited and simulated functions of a system. Second, the usage of real systems in high-interaction honeypots has a high risk of security being compromised due to its unlimited functions. To address these problems, we developed Asgard an adaptive self-guarded honeypot, which leverages reinforcement learning to learn and record attacker?s tools and behaviour while protecting itself from being deeply compromised. In this paper, we compare Asgard and its variant Midgard with two conventional SSH honeypots: Cowrie and a real Linux system. The goal of the paper is (1) to demonstrate the effectiveness of the adaptive honeypot that can learn to compromise between collecting attack data and keeping the honeypot safe, and (2) the benefit of coupling of the environment state and the action in reinforcement learning to define the reward function to effectively learn its objectives. The experimental results show that Asgard could collect higher-quality attacker data compared to Cowrie while evading the detection and could also protect the system for as long as it can through blocking or substituting the malicious programs and some other commands, which is the major problem of the high-interaction honeypot.