Understanding Model Robustness to User-generated Noisy Texts
Jakub N\'aplava, Martin Popel, Milan Straka, Jana Strakov\'a

TL;DR
This paper investigates the robustness of NLP models to user-generated noise, proposing a statistical noise model based on grammatical-error-correction data, and evaluates methods to improve performance under noisy conditions across multiple languages and tasks.
Contribution
It introduces a statistically grounded noise generation framework and provides a comprehensive evaluation of robustness and mitigation strategies for NLP models.
Findings
Noise modeling improves robustness across tasks
Training with noised data enhances model resilience
External correction systems reduce performance degradation
Abstract
Sensitivity of deep-neural models to input noise is known to be a challenging problem. In NLP, model performance often deteriorates with naturally occurring noise, such as spelling errors. To mitigate this issue, models may leverage artificially noised data. However, the amount and type of generated noise has so far been determined arbitrarily. We therefore propose to model the errors statistically from grammatical-error-correction corpora. We present a thorough evaluation of several state-of-the-art NLP systems' robustness in multiple languages, with tasks including morpho-syntactic analysis, named entity recognition, neural machine translation, a subset of the GLUE benchmark and reading comprehension. We also compare two approaches to address the performance drop: a) training the NLP models with noised data generated by our framework; and b) reducing the input noise with external…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
