MA-Bench: Towards Fine-grained Micro-Action Understanding
Kun Li, Jihao Gu, Fei Wang, Zhiliang Wu, Hehe Fan, Dan Guo

TL;DR
MA-Bench introduces a comprehensive benchmark and training dataset for evaluating and improving multimodal large language models in fine-grained micro-action understanding and human behavior analysis.
Contribution
The paper presents MA-Bench, a new benchmark with structured questions and a training corpus for advancing micro-action perception in multimodal models.
Findings
Existing MLLMs struggle with motion granularity and body-part dynamics.
Fine-tuning on MA-Bench-Train improves micro-action reasoning.
Benchmark enables systematic assessment of recognition and interpretation.
Abstract
With the rapid development of Multimodal Large Language Models (MLLMs), their potential in Micro-Action understanding, a vital role in human emotion analysis, remains unexplored due to the absence of specialized benchmarks. To tackle this issue, we present MA-Bench, a benchmark comprising 1,000 videos and a three-tier evaluation architecture that progressively examines micro-action perception, relational comprehension, and interpretive reasoning. MA-Bench contains 12,000 structured question-answer pairs, enabling systematic assessment of both recognition accuracy and action interpretation. The results of 23 representative MLLMs reveal that there are significant challenges in capturing motion granularity and fine-grained body-part dynamics. To address these challenges, we further construct MA-Bench-Train, a large-scale training corpus with 20.5K videos annotated with structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
