Quality Evaluation of the Low-Resource Synthetically Generated   Code-Mixed Hinglish Text

Vivek Srivastava; Mayank Singh

arXiv:2108.01861·cs.CL·August 5, 2021

Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text

Vivek Srivastava, Mayank Singh

PDF

Open Access

TL;DR

This paper evaluates the quality of synthetically generated code-mixed Hinglish sentences, analyzing factors affecting their perceived quality through human annotations and proposing predictive subtasks.

Contribution

It introduces two new subtasks for predicting quality ratings and annotator disagreement, advancing understanding of factors influencing code-mixed text quality.

Findings

01

Human annotators' ratings reveal key quality factors.

02

Proposed subtasks improve understanding of quality perception.

03

Synthetic Hinglish generation approaches are systematically evaluated.

Abstract

In this shared task, we seek the participating teams to investigate the factors influencing the quality of the code-mixed text generation systems. We synthetically generate code-mixed Hinglish sentences using two distinct approaches and employ human annotators to rate the generation quality. We propose two subtasks, quality rating prediction and annotators' disagreement prediction of the synthetic Hinglish dataset. The proposed subtasks will put forward the reasoning and explanation of the factors influencing the quality and human perception of the code-mixed text.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques