ShadowLogic: Backdoors in Any Whitebox LLM

Kasimir Schulz; Amelia Kawasaki; Leo Ring

arXiv:2511.00664·cs.CR·November 4, 2025

ShadowLogic: Backdoors in Any Whitebox LLM

Kasimir Schulz, Amelia Kawasaki, Leo Ring

PDF

Open Access

TL;DR

ShadowLogic reveals a vulnerability in large language models where backdoors can be covertly embedded into their computational graphs, allowing malicious content removal when triggered by specific phrases, with minimal model alterations.

Contribution

We introduce ShadowLogic, a novel method for embedding undetectable backdoors into white-box LLMs via computational graph manipulation, enabling controlled content uncensoring.

Findings

01

Achieved over 60% success rate in activating backdoors.

02

Successfully implemented ShadowLogic in Phi-3 and Llama 3.2 models.

03

Demonstrated minimal parameter changes can embed effective backdoors.

Abstract

Large language models (LLMs) are widely deployed across various applications, often with safeguards to prevent the generation of harmful or restricted content. However, these safeguards can be covertly bypassed through adversarial modifications to the computational graph of a model. This work highlights a critical security vulnerability in computational graph-based LLM formats, demonstrating that widely used deployment pipelines may be susceptible to obscured backdoors. We introduce ShadowLogic, a method for creating a backdoor in a white-box LLM by injecting an uncensoring vector into its computational graph representation. We set a trigger phrase that, when added to the beginning of a prompt into the LLM, applies the uncensoring vector and removes the content generation safeguards in the model. We embed trigger logic directly into the computational graph which detects the trigger…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Graph Neural Networks