Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains

Arun Chowdary Sanna

arXiv:2511.19874·cs.CR·November 26, 2025

Cross-LLM Generalization of Behavioral Backdoor Detection in AI Agent Supply Chains

Arun Chowdary Sanna

PDF

Open Access

TL;DR

This study investigates the challenge of detecting behavioral backdoors across different large language models (LLMs), revealing significant generalization gaps and proposing a model-aware detection method that improves accuracy.

Contribution

It is the first systematic analysis of cross-LLM behavioral backdoor detection, identifying key behavioral signatures and introducing a model-aware detection approach.

Findings

01

Single-model detectors achieve 92.7% accuracy within their training distribution.

02

Cross-LLM detection accuracy drops to 49.2%, indicating a large generalization gap.

03

Model-aware detection with model identity features reaches 90.6% accuracy across models.

Abstract

As AI agents become integral to enterprise workflows, their reliance on shared tool libraries and pre-trained components creates significant supply chain vulnerabilities. While previous work has demonstrated behavioral backdoor detection within individual LLM architectures, the critical question of cross-LLM generalization remains unexplored, a gap with serious implications for organizations deploying multiple AI systems. We present the first systematic study of cross-LLM behavioral backdoor detection, evaluating generalization across six production LLMs (GPT-5.1, Claude Sonnet 4.5, Grok 4.1, Llama 4 Maverick, GPT-OSS 120B, and DeepSeek Chat V3.1). Through 1,198 execution traces and 36 cross-model experiments, we quantify a critical finding: single-model detectors achieve 92.7% accuracy within their training distribution but only 49.2% across different LLMs, a 43.4 percentage point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Advanced Malware Detection Techniques