Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains
Arun Chowdary Sanna

TL;DR
This study investigates the challenge of detecting behavioral backdoors across different large language models (LLMs), revealing significant generalization gaps and proposing a model-aware detection method that improves accuracy.
Contribution
It is the first systematic analysis of cross-LLM behavioral backdoor detection, identifying key behavioral signatures and introducing a model-aware detection approach.
Findings
Single-model detectors achieve 92.7% accuracy within their training distribution.
Cross-LLM detection accuracy drops to 49.2%, indicating a large generalization gap.
Model-aware detection with model identity features reaches 90.6% accuracy across models.
Abstract
As AI agents become integral to enterprise workflows, their reliance on shared tool libraries and pre-trained components creates significant supply chain vulnerabilities. While previous work has demonstrated behavioral backdoor detection within individual LLM architectures, the critical question of cross-LLM generalization remains unexplored, a gap with serious implications for organizations deploying multiple AI systems. We present the first systematic study of cross-LLM behavioral backdoor detection, evaluating generalization across six production LLMs (GPT-5.1, Claude Sonnet 4.5, Grok 4.1, Llama 4 Maverick, GPT-OSS 120B, and DeepSeek Chat V3.1). Through 1,198 execution traces and 36 cross-model experiments, we quantify a critical finding: single-model detectors achieve 92.7% accuracy within their training distribution but only 49.2% across different LLMs, a 43.4 percentage point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Advanced Malware Detection Techniques
