On Sensitivity of Deep Learning Based Text Classification Algorithms to   Practical Input Perturbations

Aamir Miyajiwala; Arnav Ladkat; Samiksha Jagadale; Raviraj Joshi

arXiv:2201.00318·cs.CL·July 8, 2022

On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations

Aamir Miyajiwala, Arnav Ladkat, Samiksha Jagadale, Raviraj Joshi

PDF

TL;DR

This study evaluates how common input perturbations like adding or removing tokens affect the robustness of deep learning models such as CNN, LSTM, and BERT in text classification tasks, revealing notable sensitivities.

Contribution

The paper provides a systematic analysis of the impact of practical input perturbations on deep learning text classifiers, highlighting their vulnerabilities and offering guidance for robustness assessment.

Findings

01

BERT is more sensitive to token removal than addition.

02

LSTM shows slightly higher sensitivity to perturbations than CNN.

03

Deep learning models are vulnerable to minor input changes affecting performance.

Abstract

Text classification is a fundamental Natural Language Processing task that has a wide variety of applications, where deep learning approaches have produced state-of-the-art results. While these models have been heavily criticized for their black-box nature, their robustness to slight perturbations in input text has been a matter of concern. In this work, we carry out a data-focused study evaluating the impact of systematic practical perturbations on the performance of the deep learning based text classification models like CNN, LSTM, and BERT-based algorithms. The perturbations are induced by the addition and removal of unwanted tokens like punctuation and stop-words that are minimally associated with the final performance of the model. We show that these deep learning approaches including BERT are sensitive to such legitimate input perturbations on four standard benchmark datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Softmax · Tanh Activation · WordPiece · Adam