Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Yutong Yin, Zhaoran Wang

TL;DR
This paper investigates whether Transformers can perform compositional reasoning by connecting separated knowledge fragments, using a synthetic task to evaluate their ability to generalize and infer complete causal chains.
Contribution
The study introduces the FTCT task to test Transformers' reasoning with fragmented knowledge and demonstrates their ability to generalize through few-shot Chain-of-Thought prompting.
Findings
Transformers can perform compositional reasoning on FTCT with few-shot prompting.
Model complexity and data similarity influence reasoning ability.
Transformers learn a generalizable program enabling effective reasoning.
Abstract
Humans exhibit remarkable compositional reasoning by integrating knowledge from various sources. For example, if someone learns ( B = f(A) ) from one source and ( C = g(B) ) from another, they can deduce ( C=g(B)=g(f(A)) ) even without encountering ( ABC ) together, showcasing the generalization ability of human intelligence. In this paper, we introduce a synthetic learning task, "FTCT" (Fragmented at Training, Chained at Testing), to validate the potential of Transformers in replicating this skill and interpret its inner mechanism. In the training phase, data consist of separated knowledge fragments from an overall causal graph. During testing, Transformers must infer complete causal graph traces by integrating these fragments. Our findings demonstrate that few-shot Chain-of-Thought prompting enables Transformers to perform compositional reasoning on FTCT by revealing correct…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper investigates whether transformers are capable of generalizing to longer reasoning chains through connecting shorter ones seen in the training stage, which is an interesting and important research question. 2. The paper is technically sound: the trained transformers behave compositionally (with few-shot chain-of-thought prompting) and the authors provide insights on its internal workings: induction head and attention assignment, demonstrating that the transformer learn a generalizabl
1. Since the experiment setting is a randomly initialized transformer trained on synthetic data, to what extent the paper's conclusion can be extended to real pre-trained language models is questionable. 2. the notations used in the paper are quite complicated, making the paper a little bit difficult for readers to follow.
- The design of the FTCT task is well-conceived, as it effectively mimics real-world scenarios where knowledge is often fragmented and must be integrated to draw comprehensive conclusions. This setup provides a meaningful and practical benchmark to evaluate the compositional reasoning abilities of Transformers, making the study relevant and valuable for advancing our understanding of machine learning models' capabilities. - Chapter 5, "transformer does compositional reasoning via the underlying
- While the task studied in this paper requires strong compositional generalization abilities, it is simple and singular in its form. Generally, using a simple and singular synthetic dataset is suitable for highlighting the shortcomings of the Transformer architecture. However, since the paper concludes that Transformers possess this capability, the experiments on this task alone are not sufficient to support such a conclusion. I believe that more diverse and comprehensive tasks are needed, and
- The paper presented a very intriguing and creative approach to testing the ability for models to learn compositional reasoning ability - There are some really interesting results, specifically the exact complexity (and the increased expressability) needed for the transformer architecture to optimally solve the FTCT task - The insights regarding the few shot CoT results are of significance and spark further research in this area - The empirical findings of how the transformers performs this tas
- The clarity of this paper is lacking, especially in the notation and writing. For instance, in Figure 1, there is a seeming typo in some of the values that contradicts the setup of the dataset. Separately, some concrete examples of the data (including noise + context tokens) of the FTCT dataset would really improve the readers understanding (it took me multiple re-read to get the gist of the methodology) - The paper's definition of compositional reasoning should be explicitly written out in th
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
MethodsApproximate Bayesian Computation
