Evaluating LLM-Generated Obfuscated XSS Payloads for Machine Learning-Based Detection
Divyesh Gabbireddy, Suman Saha

TL;DR
This paper develops a pipeline using large language models to generate and evaluate obfuscated XSS payloads based on runtime behavior, revealing current limitations in behavior preservation and detection improvement.
Contribution
It introduces a structured approach combining deterministic transformations and LLMs with runtime evaluation to generate behavior-preserving obfuscated XSS payloads.
Findings
Baseline LLMs achieve 0.15 behavior match rate
Fine-tuning improves match rate to 0.22
Adding generated payloads does not enhance detection performance
Abstract
Cross-site scripting (XSS) remains a persistent web security vulnerability, especially because obfuscation can change the surface form of a malicious payload while preserving its behavior. These transformations make it difficult for traditional and machine learning-based detection systems to reliably identify attacks. Existing approaches for generating obfuscated payloads often emphasize syntactic diversity, but they do not always ensure that the generated samples remain behaviorally valid. This paper presents a structured pipeline for generating and evaluating obfuscated XSS payloads using large language models (LLMs). The pipeline combines deterministic transformation techniques with LLM-based generation and uses a browser- based runtime evaluation procedure to compare payload behavior in a controlled execution environment. This allows generated samples to be assessed through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
