Discovery of Shared Semantic Spaces for Multi-Scene Video Query and Summarization
Xun Xu, Timothy Hospedales, Shaogang Gong

TL;DR
This paper introduces a novel framework for multi-scene video analysis that clusters related scenes and identifies shared activities, enhancing surveillance tasks like activity understanding, querying, and summarization by leveraging semantic similarities across scenes.
Contribution
The work presents a new distributed multi-scene understanding framework that clusters scenes based on behavior explanation and discovers shared versus scene-specific activities, improving surveillance performance.
Findings
Improved scene activity understanding across multiple scenes.
Enhanced cross-scene query-by-example accuracy.
Reduced supervised labeling requirements for behavior classification.
Abstract
The growing rate of public space CCTV installations has generated a need for automated methods for exploiting video surveillance data including scene understanding, query, behaviour annotation and summarization. For this reason, extensive research has been performed on surveillance scene understanding and analysis. However, most studies have considered single scenes, or groups of adjacent scenes. The semantic similarity between different but related scenes (e.g., many different traffic scenes of similar layout) is not generally exploited to improve any automated surveillance tasks and reduce manual effort. Exploiting commonality, and sharing any supervised annotations, between different scenes is however challenging due to: Some scenes are totally un-related -- and thus any information sharing between them would be detrimental; while others may only share a subset of common activities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
