SafeTune: Mitigating Data Poisoning in LLM Fine-Tuning for RTL Code Generation
Mahshid Rezakhani, Nowfel Mashnoor, Kimia Azar, Hadi Kamali

TL;DR
SafeTune is a framework that enhances the security of LLM fine-tuning for RTL code generation by detecting and filtering data poisoning attacks using structural and semantic analysis methods.
Contribution
It introduces a novel combination of GNN-based structural analysis and semantic verification to defend against hardware Trojan insertion during LLM fine-tuning.
Findings
SafeTune improves robustness against data poisoning attacks.
The framework effectively filters malicious inputs without harming legitimate data.
Experimental results show increased reliability in RTL code generation.
Abstract
As large language models (LLMs) are increasingly fine-tuned for hardware tasks like RTL code generation, the scarcity of high-quality datasets often leads to the use of rapidly assembled or generated training data. These datasets frequently lack security verification and are highly susceptible to data poisoning attacks. Such poisoning can cause models to generate syntactically valid but insecure hardware modules that bypass standard functionality checks. To address this, we present SafeTune, a framework designed to harden LLM-based RTL generation against poisoning, specifically focusing on hardware Trojan (HT) insertion. SafeTune integrates two core components: (i) a Graph Neural Network (GNN) that models structural properties to identify anomalous circuitry patterns during fine-tuning, and (ii) a semantic verification module using text embeddings and an XGBoost classifier to assess…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
