The Quantum Sieve Tracer: A Hybrid Framework for Layer-Wise Activation Tracing in Large Language Models
Jonathan Pan

TL;DR
This paper presents the Quantum Sieve Tracer, a hybrid quantum-classical framework for detailed layer-wise activation analysis in large language models, revealing distinct architectural functions and mechanisms.
Contribution
It introduces a novel hybrid quantum-classical method for mechanistic interpretability, enabling high-resolution analysis of attention mechanisms in LLMs.
Findings
Qwen's layer 7 functions as a Recall Hub.
Llama's layer 9 acts as an Interference Suppression circuit.
Quantum kernels distinguish constructive and reductive attention mechanisms.
Abstract
Mechanistic interpretability aims to reverse-engineer the internal computations of Large Language Models (LLMs), yet separating sparse semantic signals from high-dimensional polysemantic noise remains a significant challenge. This paper introduces the Quantum Sieve Tracer, a hybrid quantum-classical framework designed to characterize factual recall circuits. We implement a modular pipeline that first localizes critical layers using classical causal tracing, then maps specific attention head activations into an exponentially large quantum Hilbert space. Using open-weight models (Meta Llama-3.2-1B and Alibaba Qwen2.5-1.5B-Instruct), we perform a two-stage analysis that reveals a fundamental architectural divergence. While Qwen's layer 7 circuit functions as a classic Recall Hub, we discover that Llama's layer 9 acts as an Interference Suppression circuit, where ablating the identified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Materials Science · Misinformation and Its Impacts
