PreCogIIITH at HinglishEval : Leveraging Code-Mixing Metrics & Language   Model Embeddings To Estimate Code-Mix Quality

Prashant Kodali; Tanmay Sachan; Akshay Goindani; Anmol Goel; Naman; Ahuja; Manish Shrivastava; Ponnurangam Kumaraguru

arXiv:2206.07988·cs.AI·June 17, 2022

PreCogIIITH at HinglishEval : Leveraging Code-Mixing Metrics & Language Model Embeddings To Estimate Code-Mix Quality

Prashant Kodali, Tanmay Sachan, Akshay Goindani, Anmol Goel, Naman, Ahuja, Manish Shrivastava, Ponnurangam Kumaraguru

PDF

Open Access 1 Repo

TL;DR

This paper presents models that predict the quality of machine-generated code-mixed text by leveraging code-mixing metrics and language model embeddings, addressing an open evaluation challenge in low-resource multilingual settings.

Contribution

It introduces a novel approach combining code-mixing metrics and language model embeddings to assess the quality of synthetic code-mixed text.

Findings

01

Models effectively predict code-mix quality ratings.

02

Combining metrics and embeddings improves evaluation accuracy.

03

Approach addresses low-resource code-mixing evaluation challenges.

Abstract

Code-Mixing is a phenomenon of mixing two or more languages in a speech event and is prevalent in multilingual societies. Given the low-resource nature of Code-Mixing, machine generation of code-mixed text is a prevalent approach for data augmentation. However, evaluating the quality of such machine generated code-mixed text is an open problem. In our submission to HinglishEval, a shared-task collocated with INLG2022, we attempt to build models factors that impact the quality of synthetically generated code-mix text by predicting ratings for code-mix quality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

prashantkodali/precogiiith-hinglisheval-inlg-2022
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling