OrthoFormer: Instrumental Variable Estimation in Transformer Hidden States via Neural Control Functions
Charles Luo

TL;DR
OrthoFormer introduces a causally grounded Transformer architecture that embeds instrumental variable estimation via neural control functions, addressing the limitations of correlation-based learning and improving robustness and interpretability in sequential modeling.
Contribution
It proposes a novel Transformer design incorporating instrumental variable estimation principles through neural control functions, grounded in four theoretical pillars, and provides theoretical and empirical validation.
Findings
OrthoFormer achieves bias less than OLS with valid instruments.
Residual bias decays geometrically with lag.
Experimental results confirm theoretical predictions.
Abstract
Transformer architectures excel at sequential modeling yet remain fundamentally limited by correlational learning - they capture spurious associations induced by latent confounders rather than invariant causal mechanisms. We identify this as an epistemological challenge: standard Transformers conflate static background factors (intrinsic identity, style, context) with dynamic causal flows (state evolution, mechanism), leading to catastrophic out-of-distribution failure. We propose OrthoFormer, a causally grounded architecture that embeds instrumental variable estimation directly into Transformer blocks via neural control functions. Our framework rests on four theoretical pillars: Structural Directionality (time-arrow enforcement), Representation Orthogonality (latent-noise separation), Causal Sparsity (Markov Blanket approximation), and End-to-End Consistency (gradient- detached stage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI) · Embodied and Extended Cognition
