Exploration of Masked and Causal Language Modelling for Text Generation

Nicolo Micheletti; Samuel Belkadi; Lifeng Han; Goran Nenadic

arXiv:2405.12630·cs.CL·August 12, 2024·2 cites

Exploration of Masked and Causal Language Modelling for Text Generation

Nicolo Micheletti, Samuel Belkadi, Lifeng Han, Goran Nenadic

PDF

Open Access

TL;DR

This study compares Masked Language Modelling (MLM) and Causal Language Modelling (CLM) for text generation, finding MLM generally produces more coherent and higher-quality text across various datasets and tasks.

Contribution

The paper provides an extensive empirical comparison of MLM and CLM for text generation, demonstrating MLM's superior performance and exploring its potential for future NLP research.

Findings

01

MLM outperforms CLM in text quality and coherence

02

No strong correlation between generated text quality and downstream task performance

03

MLM shows promise for future text generation research

Abstract

Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and have achieved state-of-the-art performance in practically every task in this field. However, the prevalent approach used in text generation, Causal Language Modelling (CLM), which generates text sequentially from left to right, inherently limits the freedom of the model, which does not decide when and where each token is generated. In contrast, Masked Language Modelling (MLM), primarily used for language understanding tasks, can generate tokens anywhere in the text and any order. This paper conducts an extensive comparison of MLM and CLM approaches for text generation tasks. To do so, we pre-train several language models of comparable sizes on three different datasets, namely 1) medical discharge summaries, 2) movie plot synopses, and 3) authorship verification datasets. To assess the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques