Resumen
Penetration testing is an important method to evaluate the security degree of a network system. The importance of penetration testing attack path planning lies in its ability to simulate attacker behavior, identify vulnerabilities, reduce potential losses, and continuously improve security strategies. By systematically simulating various attack scenarios, it enables proactive risk assessment and the development of robust security measures. To address the problems of inaccurate path prediction and difficult convergence in the training process of attack path planning, an algorithm which combines attack graph tools (i.e., MulVAL, multi-stage vulnerability analysis language) and the double deep Q network is proposed. This algorithm first constructs an attack tree, searches paths in the attack graph, and then builds a transfer matrix based on depth-first search to obtain all reachable paths in the target system. Finally, the optimal path for target system attack path planning is obtained by using the deep double Q network (DDQN) algorithm. The MulVAL double deep Q network(MDDQN) algorithm is tested in different scale penetration testing environments. The experimental results show that, compared with the traditional deep Q network (DQN) algorithm, the MDDQN algorithm is able to reach convergence faster and more stably and improve the efficiency of attack path planning.