RoBERTa-Augmented Synthesis for Detecting Malicious API Requests
Udi Aharon, Revital Marbel, Ran Dubin, Amit Dvir, and Chen Hajaj

TL;DR
This paper introduces a RoBERTa-based data synthesis framework to augment limited API traffic datasets, significantly improving the accuracy of malicious API request detection models through realistic, domain-aware synthetic data generation.
Contribution
It presents a novel GAN-inspired, Transformer-based data augmentation method tailored for API security, enhancing detection performance on benchmark datasets.
Findings
Up to 4.94% increase in F1 score on CSIC 2010
Up to 21.10% increase in F1 score on ATRDF 2023
Improved detection robustness with synthetic data augmentation
Abstract
Web applications and APIs face constant threats from malicious actors seeking to exploit vulnerabilities for illicit gains. To defend against these threats, it is essential to have anomaly detection systems that can identify a variety of malicious behaviors. However, a significant challenge in this area is the limited availability of training data. Existing datasets often do not provide sufficient coverage of the diverse API structures, parameter formats, and usage patterns encountered in real-world scenarios. As a result, models trained on these datasets often struggle to generalize and may fail to detect less common or emerging attack vectors. To enhance detection accuracy and robustness, it is crucial to access larger and more representative datasets that capture the true variability of API traffic. To address this, we introduce a GAN-inspired learning framework that extends limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities
