TrojanedCM: A Repository of Trojaned Large Language Models of Code
Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour

TL;DR
TrojanedCM offers a comprehensive repository of Trojaned large language models for source code, enabling researchers to evaluate detection and unlearning techniques across multiple architectures and tasks.
Contribution
This work provides the first extensive collection of Trojaned code models and a poisoning framework, facilitating research in Trojan detection and mitigation for source code models.
Findings
Repository includes Trojaned models for defect detection, clone detection, and code generation.
Provides access to model architectures and parameters for white-box analysis.
Includes a poisoning framework for deploying various attack strategies.
Abstract
With the rapid growth of research in trojaning deep neural models of source code, we observe that there is a need of developing a benchmark trojaned models for testing various trojan detection and unlearning techniques. In this work, we aim to provide the scientific community with diverse trojaned code models, that cover a variety of state-of-the-art architectures, on which they can examine such techniques. We thus present TrojanedCM, a publicly available repository of clean and poisoned models of source code. We provide poisoned models for two code classification tasks (defect detection and clone detection) and a code generation task (text-to-code generation). We finetuned popular pretrained code models such as CodeBERT, PLBART, CodeT5, CodeT5+, on poisoned datasets that we generated from benchmark datasets (Devign, BigCloneBench, CONCODE) for the above mentioned tasks. The repository…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Reliability and Analysis Research
