PreCogIIITH at HinglishEval : Leveraging Code-Mixing Metrics & Language Model Embeddings To Estimate Code-Mix Quality
Prashant Kodali, Tanmay Sachan, Akshay Goindani, Anmol Goel, Naman, Ahuja, Manish Shrivastava, Ponnurangam Kumaraguru

TL;DR
This paper presents models that predict the quality of machine-generated code-mixed text by leveraging code-mixing metrics and language model embeddings, addressing an open evaluation challenge in low-resource multilingual settings.
Contribution
It introduces a novel approach combining code-mixing metrics and language model embeddings to assess the quality of synthetic code-mixed text.
Findings
Models effectively predict code-mix quality ratings.
Combining metrics and embeddings improves evaluation accuracy.
Approach addresses low-resource code-mixing evaluation challenges.
Abstract
Code-Mixing is a phenomenon of mixing two or more languages in a speech event and is prevalent in multilingual societies. Given the low-resource nature of Code-Mixing, machine generation of code-mixed text is a prevalent approach for data augmentation. However, evaluating the quality of such machine generated code-mixed text is an open problem. In our submission to HinglishEval, a shared-task collocated with INLG2022, we attempt to build models factors that impact the quality of synthetically generated code-mix text by predicting ratings for code-mix quality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
