FLAG3D: A 3D Fitness Activity Dataset with Language Instruction
Yansong Tang, Jinpeng Liu, Aoyang Liu, Bin Yang, Wenxun Dai, Yongming, Rao, Jiwen Lu, Jie Zhou, Xiu Li

TL;DR
FLAG3D is a comprehensive large-scale 3D fitness activity dataset with detailed language instructions, enabling advancements in action recognition, human mesh recovery, and language-guided action generation in diverse environments.
Contribution
The paper introduces FLAG3D, a novel dataset with high-quality 3D human poses, detailed language annotations, and diverse natural environment videos for fitness activities.
Findings
Facilitates cross-domain human action recognition
Enables dynamic human mesh recovery research
Supports language-guided human action generation
Abstract
With the continuously thriving popularity around the world, fitness activity analytic has become an emerging research topic in computer vision. While a variety of new tasks and algorithms have been proposed recently, there are growing hunger for data resources involved in high-quality data, fine-grained labels, and diverse environments. In this paper, we present FLAG3D, a large-scale 3D fitness activity dataset with language instruction containing 180K sequences of 60 categories. FLAG3D features the following three aspects: 1) accurate and dense 3D human pose captured from advanced MoCap system to handle the complex activity and large movement, 2) detailed and professional language instruction to describe how to perform a specific activity, 3) versatile video resources from a high-tech MoCap system, rendering software, and cost-effective smartphones in natural environments. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Multimodal Machine Learning Applications
