FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

Jinsheng Wei; Zhaodi Xu; Guanming Lu; Haoyu Chen; Jingjie Yan

arXiv:2603.16269·cs.CV·March 18, 2026

FG-SGL: Fine-Grained Semantic Guidance Learning via Motion Process Decomposition for Micro-Gesture Recognition

Jinsheng Wei, Zhaodi Xu, Guanming Lu, Haoyu Chen, Jingjie Yan

PDF

Open Access

TL;DR

This paper introduces FG-SGL, a novel framework for micro-gesture recognition that leverages fine-grained and category-level semantic guidance, along with a new annotated dataset and a multi-level contrastive optimization strategy, to improve recognition accuracy.

Contribution

The paper proposes a new FG-SGL framework that integrates fine-grained and category-level semantics for better micro-gesture recognition, supported by a newly constructed annotated dataset and a multi-level contrastive learning method.

Findings

01

FG-SGL achieves competitive recognition performance.

02

Fine-grained semantic guidance improves local motion feature learning.

03

The multi-level contrastive strategy effectively optimizes the model.

Abstract

Micro-gesture recognition (MGR) is challenging due to subtle inter-class variations. Existing methods rely on category-level supervision, which is insufficient for capturing subtle and localized motion differences. Thus, this paper proposes a Fine-Grained Semantic Guidance Learning (FG-SGL) framework that jointly integrates fine-grained and category-level semantics to guide vision--language models in perceiving local MG motions. FG-SA adopts fine-grained semantic cues to guide the learning of local motion features, while CP-A enhances the separability of MG features through category-level semantic guidance. To support fine-grained semantic guidance, this work constructs a fine-grained textual dataset with human annotations that describes the dynamic process of MGs in four refined semantic dimensions. Furthermore, a Multi-Level Contrastive Optimization strategy is designed to jointly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Human Motion and Animation