FineGym: A Hierarchical Video Dataset for Fine-grained Action   Understanding

Dian Shao; Yue Zhao; Bo Dai; Dahua Lin

arXiv:2004.06704·cs.CV·April 15, 2020·31 cites

FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding

Dian Shao, Yue Zhao, Bo Dai, Dahua Lin

PDF

Open Access 1 Video

TL;DR

FineGym is a comprehensive hierarchical video dataset designed to improve fine-grained action recognition, especially in sports, by providing detailed temporal annotations at multiple semantic levels, challenging existing methods to parse complex activity structures.

Contribution

The paper introduces FineGym, a novel gymnastic video dataset with hierarchical annotations at action and sub-action levels, enabling more detailed and challenging action recognition research.

Findings

01

Existing methods struggle with fine-grained temporal parsing.

02

Hierarchical annotations reveal challenges in subtle action differentiation.

03

Systematic evaluation provides insights into current method limitations.

Abstract

On public benchmarks, current action recognition techniques have achieved great success. However, when used in real-world applications, e.g. sport analysis, which requires the capability of parsing an activity into phases and differentiating between subtly different actions, their performances remain far from being satisfactory. To take action recognition to a new level, we develop FineGym, a new dataset built on top of gymnastic videos. Compared to existing action recognition datasets, FineGym is distinguished in richness, quality, and diversity. In particular, it provides temporal annotations at both action and sub-action levels with a three-level semantic hierarchy. For example, a "balance beam" event will be annotated as a sequence of elementary sub-actions derived from five sets: "leap-jump-hop", "beam-turns", "flight-salto", "flight-handspring", and "dismount", where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding· youtube

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization