Resumen
Cross-site scripting vulnerability (XSS) is one of the most frequently exploited and harmful vulnerabilities among web vulnerabilities. In recent years, many researchers have used different machine learning methods to detect network attacks, but these methods have not achieved high accuracy and recall rates and cannot effectively combat XSS attacks. Designing a model that can achieve high accuracy and truly proactive defense against reflected XSS vulnerabilities has become a top priority for maintaining user network security at this stage. In this paper, we propose a detection model for reflected XSS vulnerabilities based on the paths-attention method (PATS model). Firstly, the model converts vulnerability data into an intermediate representation of abstract syntax trees, then traverses the abstract syntax tree to generate multiple sets of syntactic paths, and then converts them into vector representations through word embedding matrices. The model extracts semantic features using attention mechanisms to improve training effectiveness by assigning appropriate weights to different sets of syntactic paths as it learns with neural networks, which realizes the transformation from passive defense to active defense. Additionally, in the dataset processing section, we point out the shortcomings of current research datasets and construct a reliable dataset composed of 1000 malicious samples from NIST and 10,000 benign samples from GitHub for experimentation purposes. Experimental results show that compared with other machine learning models, the paths-attention method can achieve an accuracy rate of 90.25% and F1-score of 81.62%, while reducing the training time by half to 30 h.