Interpretable Human Activity Recognition for Subtle Robbery Detection in Surveillance Videos
Bryan Jhoan Caz\'ares Leyva, Ulises Gachuz Davila, Jos\'e Juan Gonz\'alez Fonseca, Juan Irving Vasquez, Vanessa A. Camacho-V\'azquez, Sergio Isah\'i Garrido-Casta\~neda

TL;DR
This paper introduces a pose-driven, interpretable system for real-time detection of subtle street robberies in surveillance videos, suitable for edge devices.
Contribution
It combines pose estimation, handcrafted features, and a Random Forest classifier into a real-time, interpretable pipeline for detecting subtle robbery events.
Findings
Effective detection on staged and internet videos
Generalizes across different scenes and viewpoints
Runs in real-time on NVIDIA Jetson Nano
Abstract
Non-violent street robberies (snatch-and-run) are difficult to detect automatically because they are brief, subtle, and often indistinguishable from benign human interactions in unconstrained surveillance footage. This paper presents a hybrid, pose-driven approach for detecting snatch-and-run events that combines real-time perception with an interpretable classification stage suitable for edge deployment. The system uses a YOLO-based pose estimator to extract body keypoints for each tracked person and computes kinematic and interaction features describing hand speed, arm extension, proximity, and relative motion between an aggressor-victim pair. A Random Forest classifier is trained on these descriptors, and a temporal hysteresis filter is applied to stabilize frame-level predictions and reduce spurious alarms. We evaluate the method on a staged dataset and on a disjoint test set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
