Interpretable-by-Design Transformers via Architectural Stream Independence
Clayton Kerce, Alexis Fox

TL;DR
This paper proposes an architectural design for transformers called Late Fusion Architecture (LFA) that enforces interpretability by maintaining separate token and semantic streams, leading to more modular, stable, and semantically meaningful models.
Contribution
The paper introduces the LFA model that enforces interpretability through stream independence, validated by new metrics and intervention experiments showing improved modularity and semantic understanding.
Findings
LFA maintains interpretable symbolic heads across layers.
Interventions on LFA heads cause minimal semantic disruption.
LFA achieves higher stability and semantic focus compared to standard transformers.
Abstract
While transformers achieve strong performance, their internal decision-making processes remain opaque. We investigate whether architectural constraints can enforce interpretability by design through architectural stream independence: maintaining a token stream (carrying symbolic structure) and contextual semantics in separated streams that remain independently observable throughout processing, with integration delayed until output. We validate this principle through the Late Fusion Architecture (LFA), which demonstrates interpretable symbolic heads through all the final layers, while standard transformers show dissolution by the third of six layers; we quantify this effect by introducing the Token-Position Dependence Score (PDS), with = 0.276 and 0.058, respectively. Crucially, intervention experiments demonstrate functional modularity: suppressing LFA's recency heads causes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques
