Are Language Models Agnostic to Linguistically Grounded Perturbations? A Case Study of Indic Languages
Poulami Ghosh, Raj Dabre, Pushpak Bhattacharyya

TL;DR
This study examines the vulnerability of pre-trained language models to linguistically grounded perturbations across Indic languages, revealing that such models are somewhat resistant but still significantly affected by subtle linguistic attacks.
Contribution
First comprehensive analysis of PLMs' susceptibility to linguistically grounded perturbations in multiple Indic languages and downstream tasks.
Findings
PLMs are susceptible to linguistic perturbations.
PLMs show slightly lower susceptibility to linguistic attacks compared to non-linguistic ones.
Linguistic attacks remain effective despite constraints.
Abstract
Pre-trained language models (PLMs) are known to be susceptible to perturbations to the input text, but existing works do not explicitly focus on linguistically grounded attacks, which are subtle and more prevalent in nature. In this paper, we study whether PLMs are agnostic to linguistically grounded attacks or not. To this end, we offer the first study addressing this, investigating different Indic languages and various downstream tasks. Our findings reveal that although PLMs are susceptible to linguistic perturbations, when compared to non-linguistic attacks, PLMs exhibit a slightly lower susceptibility to linguistic attacks. This highlights that even constrained attacks are effective. Moreover, we investigate the implications of these outcomes across a range of languages, encompassing diverse language families and different scripts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution
MethodsFocus
