A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase   Detection in Short Texts

Muhammad Haroon Shakeel; Asim Karim; Imdadullah Khan

arXiv:1912.12068·cs.CL·January 16, 2020

A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase Detection in Short Texts

Muhammad Haroon Shakeel, Asim Karim, Imdadullah Khan

PDF

TL;DR

This paper introduces a novel data augmentation method and a multi-cascaded deep learning model to improve paraphrase detection in short texts, achieving state-of-the-art results on benchmark datasets.

Contribution

It presents a graph-based data augmentation strategy combined with a multi-cascaded CNN-LSTM model for more effective paraphrase detection in short texts.

Findings

01

Achieves state-of-the-art performance on three benchmark datasets.

02

Demonstrates robustness across clean and noisy short texts.

03

Enhances paraphrase detection accuracy with combined deep and hand-crafted features.

Abstract

Paraphrase detection is an important task in text analytics with numerous applications such as plagiarism detection, duplicate question identification, and enhanced customer support helpdesks. Deep models have been proposed for representing and classifying paraphrases. These models, however, require large quantities of human-labeled data, which is expensive to obtain. In this work, we present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts. Our data augmentation strategy considers the notions of paraphrases and non-paraphrases as binary relations over the set of texts. Subsequently, it uses graph theoretic concepts to efficiently generate additional paraphrase and non-paraphrase pairs in a sound manner. Our multi-cascaded model employs three supervised feature learners (cascades) based on CNN and LSTM networks with and without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory