Transformer See, Transformer Do: Copying as an Intermediate Step in Learning Analogical Reasoning
Philipp Hellwig, Willem Zuidema, Claire E. Stevenson, Martha Lewis

TL;DR
This paper demonstrates that transformer models trained with meta-learning on analogical reasoning tasks can generalize to new problems and datasets, especially when guided by copying tasks and heterogeneous data, revealing insights into their reasoning processes.
Contribution
The study introduces a meta-learning approach that enhances transformer models' ability to perform analogical reasoning and generalize, with interpretability analyses linking model behavior to an underlying algorithm.
Findings
Models learn to attend to informative problem elements through copying tasks.
Generalization improves with more heterogeneous training datasets.
The approach enables some transfer to composed transformations, but not entirely new ones.
Abstract
Analogical reasoning is a hallmark of human intelligence, enabling us to solve new problems by transferring knowledge from one situation to another. Yet, developing artificial intelligence systems capable of robust human-like analogical reasoning has proven difficult. In this work, we train transformers using Meta-Learning for Compositionality (MLC) on an analogical reasoning task (letter-string analogies) and assess their generalization capabilities. We find that letter-string analogies become learnable when guiding the models to attend to the most informative problem elements induced by including copying tasks in the training data. Furthermore, generalization to new alphabets becomes better when models are trained with more heterogeneous datasets, where our 3-layer encoder-decoder model outperforms most frontier models. The MLC approach also enables some generalization to compositions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
