Modality-Aware Shot Relating and Comparing for Video Scene Detection
Jiawei Tan, Hongxing Wang, Kang Dang, Jiaxin Li, Zhilong Ou

TL;DR
This paper introduces MASRC, a novel modality-aware approach for video scene detection that leverages visual entity and place semantics to improve shot relation modeling and scene boundary identification.
Contribution
The paper proposes a modality-aware shot relating and comparing method that explicitly models long-term and short-term shot correlations using multi-modal semantics, enhancing scene detection accuracy.
Findings
MASRC outperforms existing methods on benchmark datasets.
Explicit modeling of multi-modal shot relations improves detection performance.
Long-term entity and short-term place semantics effectively distinguish scene boundaries.
Abstract
Video scene detection involves assessing whether each shot and its surroundings belong to the same scene. Achieving this requires meticulously correlating multi-modal cues, visual entity and place modalities, among shots and comparing semantic changes around each shot. However, most methods treat multi-modal semantics equally and do not examine contextual differences between the two sides of a shot, leading to sub-optimal detection performance. In this paper, we propose the odality-ware hot elating and omparing approach (MASRC), which enables relating shots per their own characteristics of visual entity and place modalities, as well as comparing multi-shots similarities to have scene changes explicitly encoded. Specifically, to fully harness the potential of visual entity and place modalities in modeling shot relations, we mine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Video Analysis and Summarization · Digital Media Forensic Detection
