Quality Evaluation of the Low-Resource Synthetically Generated Code-Mixed Hinglish Text
Vivek Srivastava, Mayank Singh

TL;DR
This paper evaluates the quality of synthetically generated code-mixed Hinglish sentences, analyzing factors affecting their perceived quality through human annotations and proposing predictive subtasks.
Contribution
It introduces two new subtasks for predicting quality ratings and annotator disagreement, advancing understanding of factors influencing code-mixed text quality.
Findings
Human annotators' ratings reveal key quality factors.
Proposed subtasks improve understanding of quality perception.
Synthetic Hinglish generation approaches are systematically evaluated.
Abstract
In this shared task, we seek the participating teams to investigate the factors influencing the quality of the code-mixed text generation systems. We synthetically generate code-mixed Hinglish sentences using two distinct approaches and employ human annotators to rate the generation quality. We propose two subtasks, quality rating prediction and annotators' disagreement prediction of the synthetic Hinglish dataset. The proposed subtasks will put forward the reasoning and explanation of the factors influencing the quality and human perception of the code-mixed text.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques
