A Deep Network Model for Paraphrase Detection in Short Text Messages
Basant Agarwal, Heri Ramampiaro, Helge Langseth, Massimiliano Ruocco

TL;DR
This paper introduces a deep neural network approach combining CNNs and LSTMs with word-level similarity for paraphrase detection in noisy short texts, outperforming existing methods especially on social media data.
Contribution
It presents a novel deep learning model that effectively handles noisy user-generated texts for paraphrase detection, improving over prior approaches.
Findings
Outperforms state-of-the-art on noisy Twitter data
Achieves competitive results on cleaner datasets
Effective handling of language irregularities and noise
Abstract
This paper is concerned with paraphrase detection. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Given two sentences, the objective is to detect whether they are semantically identical. An important insight from this work is that existing paraphrase systems perform well when applied on clean texts, but they do not necessarily deliver good performance against noisy texts. Challenges with paraphrase detection on user generated short texts, such as Twitter, include language irregularity and noise. To cope with these challenges, we propose a novel deep neural network-based approach that relies on coarse-grained sentence modeling using a convolutional neural network and a long short-term memory model, combined with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
