COMMENTATOR: A Code-mixed Multilingual Text Annotation Framework
Rajvee Sheth, Shubh Nisar, Heenaben Prajapati, Himanshu Beniwal,, Mayank Singh

TL;DR
COMMENTATOR is a new annotation framework that significantly speeds up the process of annotating code-mixed multilingual texts, especially Hinglish, with improved efficiency demonstrated through human evaluations.
Contribution
It introduces a specialized annotation tool for code-mixed multilingual texts, achieving faster annotation speeds compared to existing baselines.
Findings
5x faster annotation speed than baseline
Effective token and sentence-level annotation for Hinglish
Open-source availability of the tool
Abstract
As the NLP community increasingly addresses challenges associated with multilingualism, robust annotation tools are essential to handle multilingual datasets efficiently. In this paper, we introduce a code-mixed multilingual text annotation framework, COMMENTATOR, specifically designed for annotating code-mixed text. The tool demonstrates its effectiveness in token-level and sentence-level language annotation tasks for Hinglish text. We perform robust qualitative human-based evaluations to showcase COMMENTATOR led to 5x faster annotations than the best baseline. Our code is publicly available at \url{https://github.com/lingo-iitgn/commentator}. The demonstration video is available at \url{https://bit.ly/commentator_video}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
