Exploration of Masked and Causal Language Modelling for Text Generation
Nicolo Micheletti, Samuel Belkadi, Lifeng Han, Goran Nenadic

TL;DR
This study compares Masked Language Modelling (MLM) and Causal Language Modelling (CLM) for text generation, finding MLM generally produces more coherent and higher-quality text across various datasets and tasks.
Contribution
The paper provides an extensive empirical comparison of MLM and CLM for text generation, demonstrating MLM's superior performance and exploring its potential for future NLP research.
Findings
MLM outperforms CLM in text quality and coherence
No strong correlation between generated text quality and downstream task performance
MLM shows promise for future text generation research
Abstract
Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and have achieved state-of-the-art performance in practically every task in this field. However, the prevalent approach used in text generation, Causal Language Modelling (CLM), which generates text sequentially from left to right, inherently limits the freedom of the model, which does not decide when and where each token is generated. In contrast, Masked Language Modelling (MLM), primarily used for language understanding tasks, can generate tokens anywhere in the text and any order. This paper conducts an extensive comparison of MLM and CLM approaches for text generation tasks. To do so, we pre-train several language models of comparable sizes on three different datasets, namely 1) medical discharge summaries, 2) movie plot synopses, and 3) authorship verification datasets. To assess the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
