Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling
Seoik Jung, Taekyung Song, Yangro Lee, Sungjun Lee

TL;DR
This paper introduces a real-time violence detection method using short video clips and LLM-based auto-labeling, achieving high accuracy and better performance on long videos for surveillance applications.
Contribution
It presents a novel short-window sliding learning framework that leverages LLMs for auto-labeling to improve violence detection accuracy in real-time surveillance.
Findings
Achieves 95.25% accuracy on RWF-2000 dataset.
Improves performance on long videos with 83.25% accuracy on UCF-Crime.
Demonstrates strong generalization and real-time applicability.
Abstract
This paper proposes a Short-Window Sliding Learning framework for real-time violence detection in CCTV footages. Unlike conventional long-video training approaches, the proposed method divides videos into 1-2 second clips and applies Large Language Model (LLM)-based auto-caption labeling to construct fine-grained datasets. Each short clip fully utilizes all frames to preserve temporal continuity, enabling precise recognition of rapid violent events. Experiments demonstrate that the proposed method achieves 95.25\% accuracy on RWF-2000 and significantly improves performance on long videos (UCF-Crime: 83.25\%), confirming its strong generalization and real-time applicability in intelligent surveillance systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
