A Deep Network Model for Paraphrase Detection in Short Text Messages

Basant Agarwal; Heri Ramampiaro; Helge Langseth; Massimiliano Ruocco

arXiv:1712.02820·cs.IR·July 18, 2018

A Deep Network Model for Paraphrase Detection in Short Text Messages

Basant Agarwal, Heri Ramampiaro, Helge Langseth, Massimiliano Ruocco

PDF

TL;DR

This paper introduces a deep neural network approach combining CNNs and LSTMs with word-level similarity for paraphrase detection in noisy short texts, outperforming existing methods especially on social media data.

Contribution

It presents a novel deep learning model that effectively handles noisy user-generated texts for paraphrase detection, improving over prior approaches.

Findings

01

Outperforms state-of-the-art on noisy Twitter data

02

Achieves competitive results on cleaner datasets

03

Effective handling of language irregularities and noise

Abstract

This paper is concerned with paraphrase detection. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Given two sentences, the objective is to detect whether they are semantically identical. An important insight from this work is that existing paraphrase systems perform well when applied on clean texts, but they do not necessarily deliver good performance against noisy texts. Challenges with paraphrase detection on user generated short texts, such as Twitter, include language irregularity and noise. To cope with these challenges, we propose a novel deep neural network-based approach that relies on coarse-grained sentence modeling using a convolutional neural network and a long short-term memory model, combined with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.