Language Modeling and Understanding Through Paraphrase Generation and Detection
Jan Philip Wahle

TL;DR
This paper emphasizes the importance of understanding and modeling paraphrases in language models by decomposing them into linguistic aspects, leading to improved performance in tasks like plagiarism detection and question duplication.
Contribution
It introduces a fine-grained approach to paraphrase modeling by classifying paraphrases into types, enhancing semantic understanding and downstream task performance.
Findings
Models trained on paraphrase types outperform binary-based models.
Achieved 89.6% accuracy in plagiarism detection, surpassing human baselines.
Improved duplicate question identification on Quora.
Abstract
Language enables humans to share knowledge, reason about the world, and pass on strategies for survival and innovation across generations. At the heart of this process is not just the ability to communicate but also the remarkable flexibility in how we can express ourselves. We can express the same thoughts in virtually infinite ways using different words and structures - this ability to rephrase and reformulate expressions is known as paraphrase. Modeling paraphrases is a keystone to meaning in computational language models; being able to construct different variations of texts that convey the same meaning or not shows strong abilities of semantic understanding. If computational language models are to represent meaning, they must understand and control the different aspects that construct the same meaning as opposed to different meanings at a fine granularity. Yet most existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Advanced Text Analysis Techniques
