Assessing Robustness of Text Classification through Maximal Safe Radius   Computation

Emanuele La Malfa; Min Wu; Luca Laurenti; Benjie Wang; Anthony; Hartshorn; Marta Kwiatkowska

arXiv:2010.02004·cs.CL·May 6, 2021

Assessing Robustness of Text Classification through Maximal Safe Radius Computation

Emanuele La Malfa, Min Wu, Luca Laurenti, Benjie Wang, Anthony, Hartshorn, Marta Kwiatkowska

PDF

1 Repo

TL;DR

This paper introduces a method to estimate the robustness of text classification models against word substitutions by approximating the maximal safe radius, providing guarantees on prediction stability.

Contribution

It proposes a novel framework combining Monte Carlo Tree Search and linear bounding techniques to approximate the maximal safe radius for NLP models, enhancing robustness analysis.

Findings

01

The methods effectively estimate robustness bounds across multiple datasets.

02

Robustness trends vary with different embeddings and models.

03

The framework offers insights for interpretability and model reliability.

Abstract

Neural network NLP models are vulnerable to small modifications of the input that maintain the original meaning but result in a different prediction. In this paper, we focus on robustness of text classification against word substitutions, aiming to provide guarantees that the model prediction does not change if a word is replaced with a plausible alternative, such as a synonym. As a measure of robustness, we adopt the notion of the maximal safe radius for a given input text, which is the minimum distance in the embedding space to the decision boundary. Since computing the exact maximal safe radius is not feasible in practice, we instead approximate it by computing a lower and upper bound. For the upper bound computation, we employ Monte Carlo Tree Search in conjunction with syntactic filtering to analyse the effect of single and multiple word substitutions. The lower bound computation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EmanueleLM/MCTS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability · Local Interpretable Model-Agnostic Explanations