Teaching Smaller Language Models To Generalise To Unseen Compositional Questions
Tim Hartill, Neset Tan, Michael Witbrock, Patricia J. Riddle

TL;DR
This paper demonstrates that smaller language models can generalize to unseen compositional questions by combining multitask supervised pretraining with retrieval-augmented training, improving reasoning capabilities without relying solely on large models.
Contribution
It introduces a method for enhancing small model generalization to unseen questions through multitask pretraining and retrieval-augmented datasets, addressing a less explored area in zero-shot reasoning.
Findings
Performance improved with retrieval-augmented training datasets.
Strong baselines established across multiple diverse datasets.
Retrieval-based training enhances reasoning abilities in smaller models.
Abstract
We equip a smaller Language Model to generalise to answering challenging compositional questions that have not been seen in training. To do so we propose a combination of multitask supervised pretraining on up to 93 tasks designed to instill diverse reasoning abilities, and a dense retrieval system that aims to retrieve a set of evidential paragraph fragments. Recent progress in question-answering has been achieved either through prompting methods against very large pretrained Language Models in zero or few-shot fashion, or by fine-tuning smaller models, sometimes in conjunction with information retrieval. We focus on the less explored question of the extent to which zero-shot generalisation can be enabled in smaller models with retrieval against a corpus within which sufficient information to answer a particular question may not exist. We establish strong baselines in this setting for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsFocus
