OrthoFormer: Instrumental Variable Estimation in Transformer Hidden States via Neural Control Functions

Charles Luo

arXiv:2603.07431·cs.LG·March 17, 2026

OrthoFormer: Instrumental Variable Estimation in Transformer Hidden States via Neural Control Functions

Charles Luo

PDF

Open Access

TL;DR

OrthoFormer introduces a causally grounded Transformer architecture that embeds instrumental variable estimation via neural control functions, addressing the limitations of correlation-based learning and improving robustness and interpretability in sequential modeling.

Contribution

It proposes a novel Transformer design incorporating instrumental variable estimation principles through neural control functions, grounded in four theoretical pillars, and provides theoretical and empirical validation.

Findings

01

OrthoFormer achieves bias less than OLS with valid instruments.

02

Residual bias decays geometrically with lag.

03

Experimental results confirm theoretical predictions.

Abstract

Transformer architectures excel at sequential modeling yet remain fundamentally limited by correlational learning - they capture spurious associations induced by latent confounders rather than invariant causal mechanisms. We identify this as an epistemological challenge: standard Transformers conflate static background factors (intrinsic identity, style, context) with dynamic causal flows (state evolution, mechanism), leading to catastrophic out-of-distribution failure. We propose OrthoFormer, a causally grounded architecture that embeds instrumental variable estimation directly into Transformer blocks via neural control functions. Our framework rests on four theoretical pillars: Structural Directionality (time-arrow enforcement), Representation Orthogonality (latent-noise separation), Causal Sparsity (Markov Blanket approximation), and End-to-End Consistency (gradient- detached stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI) · Embodied and Extended Cognition