Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet?
Md. Abdul Awal, Mrigank Rochan, Chanchal K. Roy

TL;DR
This study compares the robustness of models trained on LLM-generated versus human-written data in software analytics, revealing that LLM data yields competitive performance but less adversarial robustness, highlighting areas for improvement.
Contribution
It systematically evaluates the robustness of pre-trained models fine-tuned on LLM-generated data against adversarial attacks across multiple software analytics tasks.
Findings
PTMs with LLM-generated data perform similarly to those with human data
Models trained on LLM data are less robust to adversarial attacks
Further research needed to improve LLM data quality for robustness
Abstract
Large Language Model (LLM)-generated data is increasingly used in software analytics, but it is unclear how this data compares to human-written data, particularly when models are exposed to adversarial scenarios. Adversarial attacks can compromise the reliability and security of software systems, so understanding how LLM-generated data performs under these conditions, compared to human-written data, which serves as the benchmark for model performance, can provide valuable insights into whether LLM-generated data offers similar robustness and effectiveness. To address this gap, we systematically evaluate and compare the quality of human-written and LLM-generated data for fine-tuning robust pre-trained models (PTMs) in the context of adversarial attacks. We evaluate the robustness of six widely used PTMs, fine-tuned on human-written and LLM-generated data, before and after adversarial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Web Application Security Vulnerabilities
