Marathi-English Code-mixed Text Generation
Dhiraj Amin, Sharvari Govilkar, Sagar Kulkarni, Yash Shashikant Lalit,, Arshi Ajaz Khwaja, Daries Xavier, Sahil Girijashankar Gupta

TL;DR
This paper presents a novel Marathi-English code-mixed text generation method that effectively produces comprehensible hybrid sentences, aiding multilingual NLP applications despite resource limitations.
Contribution
It introduces a new algorithm for Marathi-English code-mixed text generation and evaluates it using standard code-mixing metrics on a substantial dataset.
Findings
Achieved an average CMI of 0.2 indicating moderate code mixing.
Achieved an average DCM of 7.4 showing effective code blending.
Demonstrated potential for improved multilingual NLP tools.
Abstract
Code-mixing, the blending of linguistic elements from distinct languages to form meaningful sentences, is common in multilingual settings, yielding hybrid languages like Hinglish and Minglish. Marathi, India's third most spoken language, often integrates English for precision and formality. Developing code-mixed language systems, like Marathi-English (Minglish), faces resource constraints. This research introduces a Marathi-English code-mixed text generation algorithm, assessed with Code Mixing Index (CMI) and Degree of Code Mixing (DCM) metrics. Across 2987 code-mixed questions, it achieved an average CMI of 0.2 and an average DCM of 7.4, indicating effective and comprehensible code-mixed sentences. These results offer potential for enhanced NLP tools, bridging linguistic gaps in multilingual societies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Multilingual Education and Policy
