Learning from Multiple Sources for Video Summarisation
Xiatian Zhu, Chen Change Loy, Shaogang Gong

TL;DR
This paper introduces an unsupervised multi-source learning framework that combines visual and non-visual data to improve video summarisation and content understanding in surveillance videos.
Contribution
It proposes a novel method to jointly learn from heterogeneous visual and non-visual data sources, handling discrepancies and missing data effectively.
Findings
Improved video content clustering over state-of-the-art methods.
Accurate inference of missing non-visual semantics in unseen videos.
Validated video summarisation quality through user study.
Abstract
Many visual surveillance tasks, e.g.video summarisation, is conventionally accomplished through analysing imagerybased features. Relying solely on visual cues for public surveillance video understanding is unreliable, since visual observations obtained from public space CCTV video data are often not sufficiently trustworthy and events of interest can be subtle. On the other hand, non-visual data sources such as weather reports and traffic sensory signals are readily accessible but are not explored jointly to complement visual data for video content analysis and summarisation. In this paper, we present a novel unsupervised framework to learn jointly from both visual and independently-drawn non-visual data sources for discovering meaningful latent structure of surveillance video data. In particular, we investigate ways to cope with discrepant dimension and representation whist associating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Music and Audio Processing
