Modality-Aware Shot Relating and Comparing for Video Scene Detection

Jiawei Tan; Hongxing Wang; Kang Dang; Jiaxin Li; Zhilong Ou

arXiv:2412.17238·cs.CV·December 24, 2024

Modality-Aware Shot Relating and Comparing for Video Scene Detection

Jiawei Tan, Hongxing Wang, Kang Dang, Jiaxin Li, Zhilong Ou

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MASRC, a novel modality-aware approach for video scene detection that leverages visual entity and place semantics to improve shot relation modeling and scene boundary identification.

Contribution

The paper proposes a modality-aware shot relating and comparing method that explicitly models long-term and short-term shot correlations using multi-modal semantics, enhancing scene detection accuracy.

Findings

01

MASRC outperforms existing methods on benchmark datasets.

02

Explicit modeling of multi-modal shot relations improves detection performance.

03

Long-term entity and short-term place semantics effectively distinguish scene boundaries.

Abstract

Video scene detection involves assessing whether each shot and its surroundings belong to the same scene. Achieving this requires meticulously correlating multi-modal cues, $e.g.$ visual entity and place modalities, among shots and comparing semantic changes around each shot. However, most methods treat multi-modal semantics equally and do not examine contextual differences between the two sides of a shot, leading to sub-optimal detection performance. In this paper, we propose the $M$ odality- $A$ ware $S$ hot $R$ elating and $C$ omparing approach (MASRC), which enables relating shots per their own characteristics of visual entity and place modalities, as well as comparing multi-shots similarities to have scene changes explicitly encoded. Specifically, to fully harness the potential of visual entity and place modalities in modeling shot relations, we mine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

exmorgan-alter/masrc
pytorchOfficial

Videos

Modality-Aware Shot Relating and Comparing for Video Scene Detection· underline

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Video Analysis and Summarization · Digital Media Forensic Detection