CLAWSAT: Towards Both Robust and Accurate Code Models
Jinghan Jia, Shashank Srikant, Tamara Mitrovska, Chuang Gan, and Shiyu Chang, Sijia Liu, Una-May O'Reilly

TL;DR
This paper introduces CLAWSAT, a novel self-supervised learning framework that combines contrastive and adversarial learning using code obfuscation to enhance both the robustness and accuracy of code models across multiple tasks.
Contribution
It is the first systematic study leveraging code obfuscation as multi-view data for improving code model robustness and accuracy through integrated contrastive and adversarial training.
Findings
CLAWSAT improves robustness by 11% and accuracy by 6% on Python code summarization.
Adversarial training enhances model interpretability and stability.
The framework outperforms existing methods across multiple downstream tasks.
Abstract
We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models. Different from existing works, we show that code obfuscation, a standard code transformation operation, provides novel means to generate complementary `views' of a code that enable us to achieve both robust and accurate code models. To the best of our knowledge, this is the first systematic study to explore and exploit the robustness and accuracy benefits of (multi-view) code obfuscations in code models. Specifically, we first adopt adversarial codes as robustness-promoting views in CL at the self-supervised pre-training phase. This yields improved robustness and transferability for downstream tasks. Next, at the supervised fine-tuning stage, we show that adversarial training with a proper temporally-staggered schedule of adversarial code generation can further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Software Testing and Debugging Techniques
MethodsContrastive Learning
