MOCA: A Transformer-based Modular Causal Inference Framework with One-way Cross-attention and Cutting Feedback
Lei Wang, Debashis Ghosh

TL;DR
MOCA is a transformer-based causal inference framework that uses modular design and one-way attention to improve confounder adjustment, demonstrating strong performance across various simulated and real-world datasets.
Contribution
It introduces MOCA, a novel modular transformer architecture with one-way cross-attention and gradient detachment, enhancing causal effect estimation from observational data.
Findings
MOCA outperforms traditional estimators like IPW and AIPW in multiple simulated scenarios.
MOCA achieves competitive results on real-world datasets such as the Infant Health and Development Program.
The modular design with one-way attention preserves causal directionality and interpretability.
Abstract
Causal effect estimation from observational data requires careful adjustment for confounding. Classical estimators such as inverse probability weighting and augmented inverse probability weighting are effective under favorable model specification, but may become unstable when treatment assignment and outcome mechanisms are complex, non-linear, and high-dimensional. Machine learning and representation learning approaches improve flexibility, yet joint training can allow outcome-related information to influence treatment-side representations, which is undesirable from a causal perspective. We propose MOCA (Modular One-way Causal Attention), a transformer-based framework that separates treatment and outcome modeling through a modular design, and performs confounder adjustment using a one-way attention mechanism. A cutting-feedback strategy, implemented via gradient detachment, prevents the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
