MIPE: A Metric Independent Pipeline for Effective Code-Mixed NLG Evaluation
Ayush Garg, Sammed S Kagi, Vivek Srivastava, Mayank Singh

TL;DR
This paper introduces MIPE, a new evaluation pipeline for code-mixed NLG that enhances the correlation between automatic metrics and human judgments, addressing the challenges posed by linguistic diversity in code-mixed text.
Contribution
MIPE is a metric-independent evaluation framework that improves the assessment of code-mixed NLG outputs, adaptable to various language pairs and tasks.
Findings
MIPE significantly improves correlation with human judgments.
Demonstrated effectiveness on Hinglish sentences from HinGE corpus.
Applicable to multiple code-mixed language pairs and NLG tasks.
Abstract
Code-mixing is a phenomenon of mixing words and phrases from two or more languages in a single utterance of speech and text. Due to the high linguistic diversity, code-mixing presents several challenges in evaluating standard natural language generation (NLG) tasks. Various widely popular metrics perform poorly with the code-mixed NLG tasks. To address this challenge, we present a metric independent evaluation pipeline MIPE that significantly improves the correlation between evaluation metrics and human judgments on the generated code-mixed text. As a use case, we demonstrate the performance of MIPE on the machine-generated Hinglish (code-mixing of Hindi and English languages) sentences from the HinGE corpus. We can extend the proposed evaluation strategy to other code-mixed language pairs, NLG tasks, and evaluation metrics with minimal to no effort.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
