Vy\=akarana: A Colorless Green Benchmark for Syntactic Evaluation in   Indic Languages

Rajaswa Patil; Jasleen Dhillon; Siddhant Mahurkar; Saumitra Kulkarni,; Manav Malhotra; Veeky Baths

arXiv:2103.00854·cs.CL·October 5, 2021

Vy\=akarana: A Colorless Green Benchmark for Syntactic Evaluation in Indic Languages

Rajaswa Patil, Jasleen Dhillon, Siddhant Mahurkar, Saumitra Kulkarni,, Manav Malhotra, Veeky Baths

PDF

TL;DR

This paper introduces Vyākaraṇa, a benchmark of syntactic evaluation tasks for Indic languages, revealing that current multilingual models struggle to effectively capture Indic syntax, especially in Indic-specific models.

Contribution

It presents Vyākaraṇa, a novel benchmark for syntactic evaluation in Indic languages, and analyzes the syntactic understanding of various multilingual models, including Indic-specific ones.

Findings

01

Indic language models underperform on syntax tasks compared to other multilingual models.

02

Indic models do not localize syntax in middle layers as observed in other models.

03

Code-switching experiments highlight challenges in syntactic understanding in mixed-language contexts.

Abstract

While there has been significant progress towards developing NLU resources for Indic languages, syntactic evaluation has been relatively less explored. Unlike English, Indic languages have rich morphosyntax, grammatical genders, free linear word-order, and highly inflectional morphology. In this paper, we introduce Vy\=akarana: a benchmark of Colorless Green sentences in Indic languages for syntactic evaluation of multilingual language models. The benchmark comprises four syntax-related tasks: PoS Tagging, Syntax Tree-depth Prediction, Grammatical Case Marking, and Subject-Verb Agreement. We use the datasets from the evaluation tasks to probe five multilingual language models of varying architectures for syntax in Indic languages. Due to its prevalence, we also include a code-switching setting in our experiments. Our results show that the token-level and sentence-level representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsXLM-R · mBERT