Towards Better Few-Shot and Finetuning Performance with Forgetful Causal   Language Models

Hao Liu; Xinyang Geng; Lisa Lee; Igor Mordatch; Sergey Levine; Sharan; Narang; Pieter Abbeel

arXiv:2210.13432·cs.CL·February 1, 2023·1 cites

Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models

Hao Liu, Xinyang Geng, Lisa Lee, Igor Mordatch, Sergey Levine, Sharan, Narang, Pieter Abbeel

PDF

Open Access

TL;DR

This paper introduces Forgetful Causal Masking (FCM), a simple technique that improves the performance of large language models in few-shot and finetuning tasks by randomly masking past tokens during training.

Contribution

The paper proposes FCM, a novel masking method that enhances language model representations without additional computational cost, and extends it with T-FCM for bidirectional context integration.

Findings

01

FCM significantly boosts few-shot and finetuning performance of PaLM.

02

Random masking prevents over-attention to recent tokens, improving long-range understanding.

03

T-FCM further enhances performance by incorporating bidirectional context.

Abstract

Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide range of tasks. In this work, we propose a simple technique that significantly boosts the performance of LLMs without adding computational cost. Our key observation is that, by performing the next token prediction task with randomly selected past tokens masked out, we can improve the quality of the learned representations for downstream language understanding tasks. We hypothesize that randomly masking past tokens prevents over-attending to recent tokens and encourages attention to tokens in the distant past. We find that our method, Forgetful Causal Masking (FCM), significantly improves both few-shot and finetuning performance of PaLM. We further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)

MethodsPathways Language Model