BERTs are Generative In-Context Learners

David Samuel

arXiv:2406.04823·cs.CL·November 1, 2024·3 cites

BERTs are Generative In-Context Learners

David Samuel

PDF

Open Access 1 Repo 2 Models 1 Video

TL;DR

This paper shows that masked language models like DeBERTa can perform in-context learning and generative tasks without extra training, revealing complementary strengths with causal models and suggesting hybrid approaches.

Contribution

Demonstrates that masked language models can exhibit in-context learning and generative abilities using a simple inference method, challenging the focus on causal models.

Findings

01

Masked models outperform causal models on certain tasks

02

Causal models excel in different categories of tasks

03

Hybrid approaches could leverage strengths of both architectures

Abstract

While in-context learning is commonly associated with causal language models, such as GPT, we demonstrate that this capability also 'emerges' in masked language models. Through an embarrassingly simple inference technique, we enable an existing masked model, DeBERTa, to perform generative tasks without additional training or architectural changes. Our evaluation reveals that the masked and causal language models behave very differently, as they clearly outperform each other on different categories of tasks. These complementary strengths suggest that the field's focus on causal models for in-context learning may be limiting - both architectures can develop these capabilities, but with distinct advantages; pointing toward promising hybrid approaches that combine the strengths of both objectives.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ltgoslo/bert-in-context
pytorchOfficial

Models

Videos

BERTs are Generative In-Context Learners· slideslive

Taxonomy

TopicsTeaching and Learning Programming

Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Discriminative Fine-Tuning · GPT · Focus · Cosine Annealing · How do I file a dispute with Expedia?*DisputeFastService · Softmax · Layer Normalization · DeBERTa