Improving Length-Generalization in Transformers via Task Hinting
Pranjal Awasthi, Anupam Gupta

TL;DR
This paper introduces task hinting, a multitask training approach that significantly enhances length generalization in transformers, demonstrated on sorting tasks and potentially applicable to other reasoning and arithmetic problems.
Contribution
The work proposes a novel multitask training framework with task hinting to improve length generalization in transformers, supported by theoretical insights and extensive experiments.
Findings
Task hinting improves test accuracy from <1% to >92% on long sequences.
Effectiveness of auxiliary tasks varies significantly.
Introducing length-dependent parameters further boosts performance.
Abstract
It has been observed in recent years that transformers have problems with length generalization for certain types of reasoning and arithmetic tasks. In particular, the performance of a transformer model trained on tasks (say addition) up to a certain length (e.g., 5 digit numbers) drops sharply when applied to longer instances of the same problem. This work proposes an approach based on task hinting towards addressing length generalization. Our key idea is that while training the model on task-specific data, it is helpful to simultaneously train the model to solve a simpler but related auxiliary task as well. We study the classical sorting problem as a canonical example to evaluate our approach. We design a multitask training framework and show that task hinting significantly improve length generalization. For sorting we show that it is possible to train models on data consisting of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Topic Modeling
