Human-centric Behavior Description in Videos: New Benchmark and Model

Lingru Zhou; Yiqi Gao; Manqing Zhang; Peng Wu; Peng Wang; and Yanning; Zhang

arXiv:2310.02894·cs.CV·October 5, 2023

Human-centric Behavior Description in Videos: New Benchmark and Model

Lingru Zhou, Yiqi Gao, Manqing Zhang, Peng Wu, Peng Wang, and Yanning, Zhang

PDF

Open Access

TL;DR

This paper introduces a new human-centric video surveillance dataset with detailed individual behavior annotations and proposes a novel captioning model that achieves state-of-the-art results in describing behaviors at the person level.

Contribution

The paper presents a new dataset with detailed annotations for individual behaviors in surveillance videos and a novel captioning approach for fine-grained behavior description.

Findings

01

Achieved state-of-the-art results in person-level behavior captioning.

02

Created a dataset with 7,820 individuals across 1,012 videos with detailed annotations.

03

Enabled linking individuals to their behaviors for better surveillance analysis.

Abstract

In the domain of video surveillance, describing the behavior of each individual within the video is becoming increasingly essential, especially in complex scenarios with multiple individuals present. This is because describing each individual's behavior provides more detailed situational analysis, enabling accurate assessment and response to potential risks, ensuring the safety and harmony of public places. Currently, video-level captioning datasets cannot provide fine-grained descriptions for each individual's specific behavior. However, mere descriptions at the video-level fail to provide an in-depth interpretation of individual behaviors, making it challenging to accurately determine the specific identity of each individual. To address this challenge, we construct a human-centric video surveillance captioning dataset, which provides detailed descriptions of the dynamic behaviors of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods