Teaching Autoregressive Language Models Complex Tasks By Demonstration

Gabriel Recchia

arXiv:2109.02102·cs.CL·December 6, 2021

Teaching Autoregressive Language Models Complex Tasks By Demonstration

Gabriel Recchia

PDF

Open Access 1 Repo

TL;DR

Fine-tuning GPT-Neo with structured demonstrations enables it to perform complex mathematical tasks like longhand modulo operations with high accuracy, even with limited training data.

Contribution

This work shows that small, well-structured demonstration datasets can significantly improve autoregressive models' ability to perform complex tasks without changing the underlying learning algorithm.

Findings

01

GPT-Neo achieves over 80% accuracy on long division tasks after fine-tuning.

02

Structured demonstrations drastically improve model performance on complex multi-step tasks.

03

Small datasets of demonstrations can teach models complex skills without extensive retraining.

Abstract

This paper demonstrates that by fine-tuning an autoregressive language model (GPT-Neo) on appropriately structured step-by-step demonstrations, it is possible to teach it to execute a mathematical task that has previously proved difficult for Transformers - longhand modulo operations - with a relatively small number of examples. Specifically, we fine-tune GPT-Neo to solve the numbers__div_remainder task from the DeepMind Mathematics Dataset; Saxton et al. (arXiv:1904.01557) reported below 40% accuracy on this task with 2 million training examples. We show that after fine-tuning on 200 appropriately structured demonstrations of solving long division problems and reporting the remainders, the smallest available GPT-Neo model achieves over 80% accuracy. This is achieved by constructing an appropriate dataset for fine-tuning, with no changes to the learning algorithm. These results suggest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mesotron/teaching_transformers
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification

MethodsGPT-Neo