TrojanGYM: A Detector-in-the-Loop LLM for Adaptive RTL Hardware Trojan Insertion

Saideep Sreekumar; Zeng Wang; Akashdeep Saha; Weihua Xiao; Minghao Shao; Muhammad Shafique; Ozgur Sinanoglu; Ramesh Karri; Johann Knechtel

arXiv:2601.17178·cs.CR·January 27, 2026

TrojanGYM: A Detector-in-the-Loop LLM for Adaptive RTL Hardware Trojan Insertion

Saideep Sreekumar, Zeng Wang, Akashdeep Saha, Weihua Xiao, Minghao Shao, Muhammad Shafique, Ozgur Sinanoglu, Ramesh Karri, Johann Knechtel

PDF

Open Access

TL;DR

TrojanGYM is an innovative framework using large language models and feedback loops to generate diverse, functional hardware Trojans at RTL level, exposing detector blind spots and improving detection robustness.

Contribution

The paper introduces TrojanGYM, a novel agentic LLM-driven framework for adaptive hardware Trojan insertion and a new GNN-based detector, enhancing detection of diverse, functionally correct Trojans.

Findings

01

Raises detection rates from 0% to 60% on challenging benchmarks.

02

Produces Trojans with up to 83.33% evasion rates.

03

Reveals robustness gaps in current detectors.

Abstract

Hardware Trojans (HTs) remain a critical threat because learning-based detectors often overfit to narrow trigger/payload patterns and small, stylized benchmarks. We introduce TrojanGYM, an agentic, LLM-driven framework that automatically curates HT insertions to expose detector blind spots while preserving design correctness. Given high-level HT specifications, a suite of cooperating LLM agents (instantiated with GPT-4, LLaMA-3.3-70B, and Gemini-2.5Pro) proposes and refines RTL modifications that realize diverse triggers and payloads without impacting normal functionality. TrojanGYM implements a feedback-driven benchmark generation loop co-designed with HT detectors, in which constraint-aware syntactic checking and GNN-based HT detectors provide feedback that iteratively refines HT specifications and insertion strategies to better surface detector blind spots. We further propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Security and Verification in Computing · Adversarial Robustness in Machine Learning