Human-centric Behavior Description in Videos: New Benchmark and Model
Lingru Zhou, Yiqi Gao, Manqing Zhang, Peng Wu, Peng Wang, and Yanning, Zhang

TL;DR
This paper introduces a new human-centric video surveillance dataset with detailed individual behavior annotations and proposes a novel captioning model that achieves state-of-the-art results in describing behaviors at the person level.
Contribution
The paper presents a new dataset with detailed annotations for individual behaviors in surveillance videos and a novel captioning approach for fine-grained behavior description.
Findings
Achieved state-of-the-art results in person-level behavior captioning.
Created a dataset with 7,820 individuals across 1,012 videos with detailed annotations.
Enabled linking individuals to their behaviors for better surveillance analysis.
Abstract
In the domain of video surveillance, describing the behavior of each individual within the video is becoming increasingly essential, especially in complex scenarios with multiple individuals present. This is because describing each individual's behavior provides more detailed situational analysis, enabling accurate assessment and response to potential risks, ensuring the safety and harmony of public places. Currently, video-level captioning datasets cannot provide fine-grained descriptions for each individual's specific behavior. However, mere descriptions at the video-level fail to provide an in-depth interpretation of individual behaviors, making it challenging to accurately determine the specific identity of each individual. To address this challenge, we construct a human-centric video surveillance captioning dataset, which provides detailed descriptions of the dynamic behaviors of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Surveillance and Tracking Methods
