MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained   Spatio-temporally Action Localization

Shimin Chen; Wei Li; Chen Chen; Jianyang Gu; Jiaming Chu; Xunqiang; Tao; Yandong Guo

arXiv:2204.02688·cs.CV·November 28, 2024·1 cites

MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization

Shimin Chen, Wei Li, Chen Chen, Jianyang Gu, Jiaming Chu, Xunqiang, Tao, Yandong Guo

PDF

Open Access

TL;DR

This paper introduces MM-SEAL, a large-scale video dataset for multi-person multi-grained spatio-temporal action localization, along with a new network, Faster-TAD, to improve localization performance.

Contribution

The paper presents a novel large-scale dataset with detailed annotations and a new network architecture for simultaneous proposal generation and labeling in action localization.

Findings

01

Atomic action features enhance complex activity localization.

02

Pretrained features on MM-SEAL improve other benchmarks.

03

Faster-TAD effectively generates temporal proposals and labels.

Abstract

In this paper, we introduce a novel large-scale video dataset dubbed MM-SEAL for multi-person multi-grained spatio-temporal action localization among human daily life. We are the first to propose a new benchmark for multi-person spatio-temporal complex activity localization, where complex semantic and long duration bring new challenges to localization tasks. We observe that limited atomic actions can be combined into many complex activities. MM-SEAL provides both atomic action and complex activity annotations, producing 111.7k atomic actions spanning 172 action categories and 17.7k complex activities spanning 200 activity categories. We explore the relationship between atomic actions and complex activities, finding that atomic action features can improve the complex activity localization performance. Also, we propose a new network which generates temporal proposals and labels…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems · Stroke Rehabilitation and Recovery