From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety
Ganen Sethupathy, Lalit Dumka, Jan Schagen

TL;DR
This paper develops a hybrid edge-based action detection system combining skeleton analysis and vision-language models to improve real-time public safety monitoring under resource constraints.
Contribution
It presents a system-level comparison of motion-based and semantic approaches, demonstrating a hybrid architecture's effectiveness on edge devices.
Findings
Skeleton-based processing offers low latency and privacy benefits.
Vision-language models enable contextual understanding and zero-shot reasoning.
Hybrid system balances speed and semantic depth for public safety applications.
Abstract
Public spaces such as transport hubs, city centres, and event venues require timely and reliable detection of potentially violent behaviour to support public safety. While automated video analysis has made significant progress, practical deployment remains constrained by latency, privacy, and resource limitations, particularly under edge-computing conditions. This paper presents the design and demonstrator-based deployment of a hybrid edge-based action detection system that combines skeleton-based motion analysis with vision-language models for semantic scene interpretation. Skeleton-based processing enables continuous, privacy-aware monitoring with low computational overhead, while vision-language models provide contextual understanding and zero-shot reasoning capabilities for complex and previously unseen situations. Rather than proposing new recognition models, the contribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
