CLAWSAT: Towards Both Robust and Accurate Code Models

Jinghan Jia; Shashank Srikant; Tamara Mitrovska; Chuang Gan; and Shiyu Chang; Sijia Liu; Una-May O'Reilly

arXiv:2211.11711·cs.LG·March 7, 2023

CLAWSAT: Towards Both Robust and Accurate Code Models

Jinghan Jia, Shashank Srikant, Tamara Mitrovska, Chuang Gan, and Shiyu Chang, Sijia Liu, Una-May O'Reilly

PDF

Open Access 1 Repo

TL;DR

This paper introduces CLAWSAT, a novel self-supervised learning framework that combines contrastive and adversarial learning using code obfuscation to enhance both the robustness and accuracy of code models across multiple tasks.

Contribution

It is the first systematic study leveraging code obfuscation as multi-view data for improving code model robustness and accuracy through integrated contrastive and adversarial training.

Findings

01

CLAWSAT improves robustness by 11% and accuracy by 6% on Python code summarization.

02

Adversarial training enhances model interpretability and stability.

03

The framework outperforms existing methods across multiple downstream tasks.

Abstract

We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models. Different from existing works, we show that code obfuscation, a standard code transformation operation, provides novel means to generate complementary `views' of a code that enable us to achieve both robust and accurate code models. To the best of our knowledge, this is the first systematic study to explore and exploit the robustness and accuracy benefits of (multi-view) code obfuscations in code models. Specifically, we first adopt adversarial codes as robustness-promoting views in CL at the self-supervised pre-training phase. This yields improved robustness and transferability for downstream tasks. Next, at the supervised fine-tuning stage, we show that adversarial training with a proper temporally-staggered schedule of adversarial code generation can further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

optml-group/claw-sat
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Software Testing and Debugging Techniques

MethodsContrastive Learning