COMMENTATOR: A Code-mixed Multilingual Text Annotation Framework

Rajvee Sheth; Shubh Nisar; Heenaben Prajapati; Himanshu Beniwal,; Mayank Singh

arXiv:2408.03125·cs.CL·August 7, 2024

COMMENTATOR: A Code-mixed Multilingual Text Annotation Framework

Rajvee Sheth, Shubh Nisar, Heenaben Prajapati, Himanshu Beniwal,, Mayank Singh

PDF

Open Access 1 Repo 1 Video

TL;DR

COMMENTATOR is a new annotation framework that significantly speeds up the process of annotating code-mixed multilingual texts, especially Hinglish, with improved efficiency demonstrated through human evaluations.

Contribution

It introduces a specialized annotation tool for code-mixed multilingual texts, achieving faster annotation speeds compared to existing baselines.

Findings

01

5x faster annotation speed than baseline

02

Effective token and sentence-level annotation for Hinglish

03

Open-source availability of the tool

Abstract

As the NLP community increasingly addresses challenges associated with multilingualism, robust annotation tools are essential to handle multilingual datasets efficiently. In this paper, we introduce a code-mixed multilingual text annotation framework, COMMENTATOR, specifically designed for annotating code-mixed text. The tool demonstrates its effectiveness in token-level and sentence-level language annotation tasks for Hinglish text. We perform robust qualitative human-based evaluations to showcase COMMENTATOR led to 5x faster annotations than the best baseline. Our code is publicly available at \url{https://github.com/lingo-iitgn/commentator}. The demonstration video is available at \url{https://bit.ly/commentator_video}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lingo-iitgn/commentator
noneOfficial

Videos

Commentator: A Code-mixed Multilingual Text Annotation Framework· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling