It's All in The [MASK]: Simple Instruction-Tuning Enables BERT-like Masked Language Models As Generative Classifiers
Benjamin Clavi\'e, Nathan Cooper, Benjamin Warner

TL;DR
This paper shows that encoder-only models like BERT can be used as generative classifiers with simple training and inference, achieving competitive zero-shot and fine-tuned performance without complex prompts or architecture changes.
Contribution
Introducing ModernBERT-Large-Instruct, a 0.4B-parameter encoder model that leverages its MLM head for generative classification, outperforming similar-sized LLMs and matching traditional classifiers.
Findings
Strong zero-shot performance on classification and knowledge tasks.
Generative approach matches or surpasses traditional classifiers after fine-tuning.
Performance depends on training data diversity and volume.
Abstract
While encoder-only models such as BERT and ModernBERT are ubiquitous in real-world NLP applications, their conventional reliance on task-specific classification heads can limit their applicability compared to decoder-based large language models (LLMs). In this work, we introduce ModernBERT-Large-Instruct, a 0.4B-parameter encoder model that leverages its masked language modelling (MLM) head for generative classification. Our approach employs an intentionally simple training loop and inference mechanism that requires no heavy pre-processing, heavily engineered prompting, or architectural modifications. ModernBERT-Large-Instruct exhibits strong zero-shot performance on both classification and knowledge-based tasks, outperforming similarly sized LLMs on MMLU and achieving 93% of Llama3-1B's MMLU performance with 60% less parameters. We also demonstrate that, when fine-tuned, the generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
