Action Understanding with Multiple Classes of Actors
Chenliang Xu, Caiming Xiong, Jason J. Corso

TL;DR
This paper introduces a new dataset and approach for understanding actions involving multiple actor types in videos, demonstrating that joint modeling of actors and actions improves recognition performance.
Contribution
It is the first to jointly analyze various actor types and actions in videos, providing a dataset, benchmarks, and insights into multi-actor action understanding.
Findings
Joint actor-action modeling improves recognition accuracy.
Multi-scale analysis enhances understanding of complex actions.
The Actor-Action Dataset (A2D) supports future research.
Abstract
Despite the rapid progress, existing works on action understanding focus strictly on one type of action agent, which we call actor---a human adult, ignoring the diversity of actions performed by other actors. To overcome this narrow viewpoint, our paper marks the first effort in the computer vision community to jointly consider algorithmic understanding of various types of actors undergoing various actions. To begin with, we collect a large annotated Actor-Action Dataset (A2D) that consists of 3782 short videos and 31 temporally untrimmed long videos. We formulate the general actor-action understanding problem and instantiate it at various granularities: video-level single- and multiple-label actor-action recognition, and pixel-level actor-action segmentation. We propose and examine a comprehensive set of graphical models that consider the various types of interplay among actors and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · AI-based Problem Solving and Planning · Anomaly Detection Techniques and Applications
