Scene Consistency Representation Learning for Video Scene Segmentation

Haoqian Wu; Keyu Chen; Yanan Luo; Ruizhi Qiao; Bo Ren; Haozhe Liu,; Weicheng Xie; Linlin Shen

arXiv:2205.05487·cs.CV·May 12, 2022·1 cites

Scene Consistency Representation Learning for Video Scene Segmentation

Haoqian Wu, Keyu Chen, Yanan Luo, Ruizhi Qiao, Bo Ren, Haozhe Liu,, Weicheng Xie, Linlin Shen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised learning framework that enhances shot representations for long-term video scene segmentation, achieving state-of-the-art results without relying on explicit boundary annotations.

Contribution

It proposes a novel SSL scheme for scene consistency, utilizing data augmentation and a less biased temporal model, along with a new benchmark for fair evaluation.

Findings

01

Achieved state-of-the-art performance on video scene segmentation

02

Introduced a self-supervised approach for shot representation learning

03

Provided a more fair benchmark for evaluating segmentation methods

Abstract

A long-term video, such as a movie or TV show, is composed of various scenes, each of which represents a series of shots sharing the same semantic story. Spotting the correct scene boundary from the long-term video is a challenging task, since a model must understand the storyline of the video to figure out where a scene starts and ends. To this end, we propose an effective Self-Supervised Learning (SSL) framework to learn better shot representations from unlabeled long-term videos. More specifically, we present an SSL scheme to achieve scene consistency, while exploring considerable data augmentation and shuffling methods to boost the model generalizability. Instead of explicitly learning the scene boundary features as in the previous methods, we introduce a vanilla temporal model with less inductive bias to verify the quality of the shot features. Our method achieves the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TencentYoutuResearch/SceneSegmentation-SCRL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Image and Video Quality Assessment · Human Pose and Action Recognition