SMCLM: Semantically Meaningful Causal Language Modeling for Autoregressive Paraphrase Generation

Micha{\l} Pere{\l}kiewicz; S{\l}awomir Dadas; Rafa{\l} Po\'swiata

arXiv:2507.03415·cs.CL·July 8, 2025

SMCLM: Semantically Meaningful Causal Language Modeling for Autoregressive Paraphrase Generation

Micha{\l} Pere{\l}kiewicz, S{\l}awomir Dadas, Rafa{\l} Po\'swiata

PDF

TL;DR

This paper introduces SMCLM, a self-supervised causal language model that generates high-quality paraphrases by using semantically meaningful representations, achieving state-of-the-art results in unsupervised paraphrase generation.

Contribution

The paper proposes a novel semantically meaningful causal language modeling approach that enhances autoregressive paraphrase generation without supervision.

Findings

01

SMCLM produces robust, high-quality paraphrases.

02

It outperforms existing unsupervised methods.

03

Current automatic metrics have low reliability for paraphrase evaluation.

Abstract

This article introduces semantically meaningful causal language modeling (SMCLM), a selfsupervised method of training autoregressive models to generate semantically equivalent text. Our approach involves using semantically meaningful text representation as an initial embedding in the autoregressive training and generation processes. The extensive empirical study demonstrates that the SMCLM approach makes autoregressive models capable of learning robust and high-quality paraphrase generation. The proposed method is competitive with the supervised method and achieves state-of-the-art results in unsupervised approaches. This article also presents a comprehensive set of automatic metrics that cover a wide range of autogenerated paraphrase evaluation aspects. Simultaneously, this article highlights the low reliability of the metrics that are widely used in paraphrase generation evaluation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.