IndicSentEval: How Effectively do Multilingual Transformer Models encode Linguistic Properties for Indic Languages?
Akhilesh Aravapalli, Mounika Marreddy, Radhika Mamidi, Manish Gupta, Subba Reddy Oota

TL;DR
This study evaluates how well multilingual Transformer models encode linguistic properties and their robustness across 13 Indic languages using a new benchmark dataset, revealing strengths and weaknesses in encoding and robustness.
Contribution
Introduces IndicSentEval, a novel benchmark dataset for probing multilingual models on Indic languages, and analyzes encoding and robustness across multiple models and perturbations.
Findings
Indic-specific models better encode Indic linguistic properties.
Universal models show greater robustness to input perturbations.
Multilingual models perform well on English but have mixed results on Indic languages.
Abstract
Transformer-based models have revolutionized the field of natural language processing. To understand why they perform so well and to assess their reliability, several studies have focused on questions such as: Which linguistic properties are encoded by these models, and to what extent? How robust are these models in encoding linguistic properties when faced with perturbations in the input text? However, these studies have mainly focused on BERT and the English language. In this paper, we investigate similar questions regarding encoding capability and robustness for 8 linguistic properties across 13 different perturbations in 6 Indic languages, using 9 multilingual Transformer models (7 universal and 2 Indic-specific). To conduct this study, we introduce a novel multilingual benchmark dataset, IndicSentEval, containing approximately 47K sentences. Surprisingly, our probing analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Dense Connections · Adam · WordPiece · Attention Dropout · Linear Layer · Residual Connection · Weight Decay · Position-Wise Feed-Forward Layer
