Multi-Track Multimodal Learning on iMiGUE: Micro-Gesture and Emotion Recognition

Arman Martirosyan; Shahane Tigranyan; Maria Razzhivina; Artak Aslanyan; Nazgul Salikhova; Ilya Makarov; Andrey Savchenko; Aram Avetisyan

arXiv:2512.23291·cs.CV·December 30, 2025

Multi-Track Multimodal Learning on iMiGUE: Micro-Gesture and Emotion Recognition

Arman Martirosyan, Shahane Tigranyan, Maria Razzhivina, Artak Aslanyan, Nazgul Salikhova, Ilya Makarov, Andrey Savchenko, Aram Avetisyan

PDF

Open Access

TL;DR

This paper introduces multimodal frameworks for micro-gesture recognition and emotion prediction using video and skeletal data, achieving high accuracy on the iMiGUE dataset and securing second place in a challenge.

Contribution

It presents novel multimodal fusion methods combining RGB, 3D pose, facial, and contextual data for fine-grained behavior analysis and emotion recognition.

Findings

01

Achieved high accuracy in micro-gesture classification.

02

Secured 2nd place in the MiGA 2025 Challenge.

03

Demonstrated effective multimodal fusion for emotion prediction.

Abstract

Micro-gesture recognition and behavior-based emotion prediction are both highly challenging tasks that require modeling subtle, fine-grained human behaviors, primarily leveraging video and skeletal pose data. In this work, we present two multimodal frameworks designed to tackle both problems on the iMiGUE dataset. For micro-gesture classification, we explore the complementary strengths of RGB and 3D pose-based representations to capture nuanced spatio-temporal patterns. To comprehensively represent gestures, video, and skeletal embeddings are extracted using MViTv2-S and 2s-AGCN, respectively. Then, they are integrated through a Cross-Modal Token Fusion module to combine spatial and pose information. For emotion recognition, our framework extends to behavior-based emotion prediction, a binary classification task identifying emotional states based on visual cues. We leverage facial and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Human Pose and Action Recognition · Multimodal Machine Learning Applications