Multilingual Coreference Resolution in Low-resource South Asian   Languages

Ritwik Mishra; Pooja Desur; Rajiv Ratn Shah; Ponnurangam Kumaraguru

arXiv:2402.13571·cs.CL·March 26, 2024·1 cites

Multilingual Coreference Resolution in Low-resource South Asian Languages

Ritwik Mishra, Pooja Desur, Rajiv Ratn Shah, Ponnurangam Kumaraguru

PDF

Open Access 1 Repo

TL;DR

This paper introduces TransMuCoRes, a multilingual coreference resolution dataset for 31 South Asian languages, and evaluates models on Hindi, highlighting dataset creation, model performance, and evaluation challenges.

Contribution

It presents the first end-to-end coreference resolution evaluation on Hindi and introduces a new multilingual dataset for low-resource South Asian languages.

Findings

01

Best model achieved 64 LEA F1 and 68 CoNLL F1 scores on Hindi.

02

Nearly all translations passed sanity checks, with 75% alignment.

03

Current evaluation metrics have limitations for datasets with split antecedents.

Abstract

Coreference resolution involves the task of identifying text spans within a discourse that pertain to the same real-world entity. While this task has been extensively explored in the English language, there has been a notable scarcity of publicly accessible resources and models for coreference resolution in South Asian languages. We introduce a Translated dataset for Multilingual Coreference Resolution (TransMuCoRes) in 31 South Asian languages using off-the-shelf tools for translation and word-alignment. Nearly all of the predicted translations successfully pass a sanity check, and 75% of English references align with their predicted translations. Using multilingual encoders, two off-the-shelf coreference resolution models were trained on a concatenation of TransMuCoRes and a Hindi coreference resolution dataset with manual annotations. The best performing model achieved a score of 64…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ritwikmishra/transmucores
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsALIGN