BERTs are Generative In-Context Learners
David Samuel

TL;DR
This paper shows that masked language models like DeBERTa can perform in-context learning and generative tasks without extra training, revealing complementary strengths with causal models and suggesting hybrid approaches.
Contribution
Demonstrates that masked language models can exhibit in-context learning and generative abilities using a simple inference method, challenging the focus on causal models.
Findings
Masked models outperform causal models on certain tasks
Causal models excel in different categories of tasks
Hybrid approaches could leverage strengths of both architectures
Abstract
While in-context learning is commonly associated with causal language models, such as GPT, we demonstrate that this capability also 'emerges' in masked language models. Through an embarrassingly simple inference technique, we enable an existing masked model, DeBERTa, to perform generative tasks without additional training or architectural changes. Our evaluation reveals that the masked and causal language models behave very differently, as they clearly outperform each other on different categories of tasks. These complementary strengths suggest that the field's focus on causal models for in-context learning may be limiting - both architectures can develop these capabilities, but with distinct advantages; pointing toward promising hybrid approaches that combine the strengths of both objectives.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTeaching and Learning Programming
Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Discriminative Fine-Tuning · GPT · Focus · Cosine Annealing · How do I file a dispute with Expedia?*DisputeFastService · Softmax · Layer Normalization · DeBERTa
