OmniMER: Auxiliary-Enhanced LLM Adaptation for Indonesian Multimodal Emotion Recognition
Xueming Yan, Boyan Xu, Yaochu Jin, Lixian Xiao, Wenlong Ye, Runyang Cai, Zeqi Zheng, Jingfa Liu, Aimin Yang, Yongduan Song

TL;DR
This paper introduces IndoMER, a new multimodal Indonesian emotion recognition dataset, and proposes OmniMER, an auxiliary-enhanced adaptation framework that improves emotion recognition accuracy by leveraging modality-specific auxiliary tasks.
Contribution
The paper presents the first Indonesian multimodal emotion recognition benchmark and a novel adaptation framework that enhances model performance through auxiliary modality-specific perception tasks.
Findings
OmniMER outperforms the base model with significant F1 score improvements.
The dataset captures realistic challenges like cross-modal inconsistency and long-tailed distributions.
Cross-lingual evaluation shows the framework's generalizability.
Abstract
Indonesian, spoken by over 200 million people, remains underserved in multimodal emotion recognition research despite its dominant presence on Southeast Asian social media platforms. We introduce IndoMER, the first multimodal emotion recognition benchmark for Indonesian, comprising 1,944 video segments from 203 speakers with temporally aligned text, audio, and visual annotations across seven emotion categories. The dataset exhibits realistic challenges including cross-modal inconsistency and long-tailed class distributions shaped by Indonesian cultural communication norms. To address these challenges, we propose OmniMER, a multimodal adaptation framework built upon Qwen2.5-Omni that enhances emotion recognition through three auxiliary modality-specific perception tasks: emotion keyword extraction for text, facial expression analysis for video, and prosody analysis for audio. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications
