Language Modeling and Understanding Through Paraphrase Generation and Detection

Jan Philip Wahle

arXiv:2602.08274·cs.CL·February 25, 2026

Language Modeling and Understanding Through Paraphrase Generation and Detection

Jan Philip Wahle

PDF

Open Access

TL;DR

This paper emphasizes the importance of understanding and modeling paraphrases in language models by decomposing them into linguistic aspects, leading to improved performance in tasks like plagiarism detection and question duplication.

Contribution

It introduces a fine-grained approach to paraphrase modeling by classifying paraphrases into types, enhancing semantic understanding and downstream task performance.

Findings

01

Models trained on paraphrase types outperform binary-based models.

02

Achieved 89.6% accuracy in plagiarism detection, surpassing human baselines.

03

Improved duplicate question identification on Quora.

Abstract

Language enables humans to share knowledge, reason about the world, and pass on strategies for survival and innovation across generations. At the heart of this process is not just the ability to communicate but also the remarkable flexibility in how we can express ourselves. We can express the same thoughts in virtually infinite ways using different words and structures - this ability to rephrase and reformulate expressions is known as paraphrase. Modeling paraphrases is a keystone to meaning in computational language models; being able to construct different variations of texts that convey the same meaning or not shows strong abilities of semantic understanding. If computational language models are to represent meaning, they must understand and control the different aspects that construct the same meaning as opposed to different meanings at a fine granularity. Yet most existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Advanced Text Analysis Techniques