On Adversarial Robustness of Synthetic Code Generation

Mrinal Anand; Pratik Kayal; Mayank Singh

arXiv:2106.11629·cs.LG·June 23, 2021

On Adversarial Robustness of Synthetic Code Generation

Mrinal Anand, Pratik Kayal, Mayank Singh

PDF

Open Access

TL;DR

This paper investigates the adversarial robustness of synthetic code generation models, revealing dataset biases and proposing augmentation techniques to improve model resilience, especially for DSL-based systems like AlgoLisp.

Contribution

It identifies dataset bias in AlgoLisp code generation and introduces augmentation methods to enhance robustness against adversarial examples.

Findings

01

Transformer models outperform baselines on AlgoLisp

02

Models perform poorly under adversarial attacks

03

Dataset augmentation improves robustness

Abstract

Automatic code synthesis from natural language descriptions is a challenging task. We witness massive progress in developing code generation systems for domain-specific languages (DSLs) employing sequence-to-sequence deep learning techniques in the recent past. In this paper, we specifically experiment with \textsc{AlgoLisp} DSL-based generative models and showcase the existence of significant dataset bias through different classes of adversarial examples. We also experiment with two variants of Transformer-based models that outperform all existing \textsc{AlgoLisp} DSL-based code generation baselines. Consistent with the current state-of-the-art systems, our proposed models, too, achieve poor performance under adversarial settings. Therefore, we propose several dataset augmentation techniques to reduce bias and showcase their efficacy using robust experimentation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research