Iterative Decoding for Compositional Generalization in Transformers
Luana Ruiz, Joshua Ainslie, Santiago Onta\~n\'on

TL;DR
This paper proposes iterative decoding for transformers, enhancing their ability to generalize compositionally by breaking down tasks into intermediate steps, with demonstrated improvements on specific datasets.
Contribution
It introduces iterative decoding as a novel approach to improve compositional generalization in transformers, showing it outperforms traditional seq2seq methods on certain datasets.
Findings
Improves transformer performance on PCFG and Cartesian product datasets
Seq2seq transformers do not inherently learn iterative processes
Identifies limitations of iterative decoding on CFQ dataset
Abstract
Deep learning models generalize well to in-distribution data but struggle to generalize compositionally, i.e., to combine a set of learned primitives to solve more complex tasks. In sequence-to-sequence (seq2seq) learning, transformers are often unable to predict correct outputs for longer examples than those seen at training. This paper introduces iterative decoding, an alternative to seq2seq that (i) improves transformer compositional generalization in the PCFG and Cartesian product datasets and (ii) evidences that, in these datasets, seq2seq transformers do not learn iterations that are not unrolled. In iterative decoding, training examples are broken down into a sequence of intermediate steps that the transformer learns iteratively. At inference time, the intermediate outputs are fed back to the transformer as intermediate inputs until an end-of-iteration token is predicted. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
