On Adversarial Robustness of Synthetic Code Generation
Mrinal Anand, Pratik Kayal, Mayank Singh

TL;DR
This paper investigates the adversarial robustness of synthetic code generation models, revealing dataset biases and proposing augmentation techniques to improve model resilience, especially for DSL-based systems like AlgoLisp.
Contribution
It identifies dataset bias in AlgoLisp code generation and introduces augmentation methods to enhance robustness against adversarial examples.
Findings
Transformer models outperform baselines on AlgoLisp
Models perform poorly under adversarial attacks
Dataset augmentation improves robustness
Abstract
Automatic code synthesis from natural language descriptions is a challenging task. We witness massive progress in developing code generation systems for domain-specific languages (DSLs) employing sequence-to-sequence deep learning techniques in the recent past. In this paper, we specifically experiment with \textsc{AlgoLisp} DSL-based generative models and showcase the existence of significant dataset bias through different classes of adversarial examples. We also experiment with two variants of Transformer-based models that outperform all existing \textsc{AlgoLisp} DSL-based code generation baselines. Consistent with the current state-of-the-art systems, our proposed models, too, achieve poor performance under adversarial settings. Therefore, we propose several dataset augmentation techniques to reduce bias and showcase their efficacy using robust experimentation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
