CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations
Divyaksh Shukla, Ritesh Baviskar, Dwijesh Gohil, Aniket Tiwari, Atul Shree, Ashutosh Modi

TL;DR
This paper introduces CoMuMDR, a novel multi-modal, multi-domain, code-mixed Hindi-English discourse parsing corpus with annotations, highlighting the challenges faced by current models in realistic, diverse conversational settings.
Contribution
The creation of CoMuMDR, a comprehensive, annotated, multi-modal corpus for discourse parsing in code-mixed, multi-domain conversations, addressing a gap in existing datasets.
Findings
State-of-the-art models perform poorly on CoMuMDR, indicating challenges in multi-domain code-mixed discourse parsing.
The corpus includes audio and transcribed text with nine discourse relations, facilitating diverse research.
Results emphasize the need for developing better models for realistic, multi-domain, code-mixed conversational data.
Abstract
Discourse parsing is an important task useful for NLU applications such as summarization, machine comprehension, and emotion recognition. The current discourse parsing datasets based on conversations consists of written English dialogues restricted to a single domain. In this resource paper, we introduce CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations. The corpus (code-mixed in Hindi and English) has both audio and transcribed text and is annotated with nine discourse relations. We experiment with various SoTA baseline models; the poor performance of SoTA models highlights the challenges of multi-domain code-mixed corpus, pointing towards the need for developing better models for such realistic settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Natural Language Processing Techniques
