Teaching Transformers Causal Reasoning through Axiomatic Training

Aniket Vashishtha; Abhinav Kumar; Atharva Pandey; Abbavaram Gowtham Reddy; Kabir Ahuja; Vineeth N Balasubramanian; Amit Sharma

arXiv:2407.07612·cs.LG·October 27, 2025·1 cites

Teaching Transformers Causal Reasoning through Axiomatic Training

Aniket Vashishtha, Abhinav Kumar, Atharva Pandey, Abbavaram Gowtham Reddy, Kabir Ahuja, Vineeth N Balasubramanian, Amit Sharma

PDF

Open Access 1 Video

TL;DR

This paper introduces an axiomatic training approach for teaching transformers causal reasoning from symbolic demonstrations, enabling generalization to complex scenarios and improving performance on causal benchmarks.

Contribution

It presents a novel axiomatic training method for transformers, demonstrating effective generalization and state-of-the-art results on causal reasoning benchmarks.

Findings

01

Models trained on causal axioms generalize to complex graphs

02

Axiomatic training improves performance on causal benchmarks

03

Finetuned language models surpass GPT-4 on some causal tasks

Abstract

For text-based AI systems to interact in the real world, causal reasoning is an essential skill. Since active interventions are costly, we study to what extent a system can learn causal reasoning from symbolic demonstrations of causal axioms. Specifically, we present an axiomatic training method where the system learns from multiple demonstrations of a causal axiom (or rule), rather than incorporating the axiom as an inductive bias or inferring it from data values. A key question is whether the system would learn to generalize from the axiom demonstrations to more complex scenarios. Our results, based on applying axiomatic training to learn the transitivity axiom and d-separation rule, indicate that such generalization is possible. To avoid data contamination issues, we start with a 67 million parameter transformer model and train it from scratch. On both tasks, we find that a model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Teaching Transformers Causal Reasoning through Axiomatic Training· slideslive

Taxonomy

TopicsTeaching and Learning Programming

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Adam · Dropout · Multi-Head Attention · Dense Connections · Softmax