Understanding Model Robustness to User-generated Noisy Texts

Jakub N\'aplava; Martin Popel; Milan Straka; Jana Strakov\'a

arXiv:2110.07428·cs.CL·November 18, 2021

Understanding Model Robustness to User-generated Noisy Texts

Jakub N\'aplava, Martin Popel, Milan Straka, Jana Strakov\'a

PDF

Open Access 1 Repo

TL;DR

This paper investigates the robustness of NLP models to user-generated noise, proposing a statistical noise model based on grammatical-error-correction data, and evaluates methods to improve performance under noisy conditions across multiple languages and tasks.

Contribution

It introduces a statistically grounded noise generation framework and provides a comprehensive evaluation of robustness and mitigation strategies for NLP models.

Findings

01

Noise modeling improves robustness across tasks

02

Training with noised data enhances model resilience

03

External correction systems reduce performance degradation

Abstract

Sensitivity of deep-neural models to input noise is known to be a challenging problem. In NLP, model performance often deteriorates with naturally occurring noise, such as spelling errors. To mitigate this issue, models may leverage artificially noised data. However, the amount and type of generated noise has so far been determined arbitrarily. We therefore propose to model the errors statistically from grammatical-error-correction corpora. We present a thorough evaluation of several state-of-the-art NLP systems' robustness in multiple languages, with tasks including morpho-syntactic analysis, named entity recognition, neural machine translation, a subset of the GLUE benchmark and reading comprehension. We also compare two approaches to address the performance drop: a) training the NLP models with noised data generated by our framework; and b) reducing the input noise with external…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ufal/kazitext
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification