Mitigating Category Imbalance: Fosafer System for the Multimodal Emotion and Intent Joint Understanding Challenge
Honghong Wang, Yankai Wang, Dejun Zhang, Jing Deng, Rong Zheng

TL;DR
This paper proposes Fosafer, a multimodal approach using data augmentation, specialized loss functions, and model fine-tuning to improve emotion and intent recognition in Mandarin, effectively addressing category imbalance and modal competition.
Contribution
The paper introduces Fosafer, combining data augmentation, SampleWeighted Focal Contrastive loss, modal dropout, and fine-tuning of Hubert to enhance joint emotion and intent understanding in Mandarin.
Findings
Achieved second-best performance in the Mandarin challenge.
Demonstrated effectiveness of data augmentation and specialized loss functions.
Showed improved recognition of minority and semantically similar classes.
Abstract
This paper presents Fosafer approach to the Track 2 Mandarin in the Multimodal Emotion and Intent Joint Understandingchallenge, which focuses on achieving joint recognition of emotion and intent in Mandarin, despite the issue of category imbalance. To alleviate this issue, we use a variety of data augmentation techniques across text, video, and audio modalities. Additionally, we introduce the SampleWeighted Focal Contrastive loss, designed to address the challenges of recognizing minority class samples and those that are semantically similar but difficult to distinguish. Moreover, we fine-tune the Hubert model to adapt the emotion and intent joint recognition. To mitigate modal competition, we introduce a modal dropout strategy. For the final predictions, a plurality voting approach is used to determine the results. The experimental results demonstrate the effectiveness of our method,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
