OS-MSL: One Stage Multimodal Sequential Link Framework for Scene   Segmentation and Classification

Ye Liu; Lingfeng Qiao; Di Yin; Zhuoxuan Jiang; Xinghua Jiang; Deqiang; Jiang; Bo Ren

arXiv:2207.01241·cs.CV·July 5, 2022

OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification

Ye Liu, Lingfeng Qiao, Di Yin, Zhuoxuan Jiang, Xinghua Jiang, Deqiang, Jiang, Bo Ren

PDF

Open Access

TL;DR

This paper introduces OS-MSL, a unified framework that predicts links between shots to improve scene segmentation and classification by leveraging multimodal data and shot differences.

Contribution

The paper proposes a novel unified link prediction approach for scene segmentation and classification, integrating local and global scene information in a single model.

Findings

01

Outperforms strong baselines on MovieScenes dataset

02

Effectively leverages multimodal shot features

03

Demonstrates robustness on real-world data

Abstract

Scene segmentation and classification (SSC) serve as a critical step towards the field of video structuring analysis. Intuitively, jointly learning of these two tasks can promote each other by sharing common information. However, scene segmentation concerns more on the local difference between adjacent shots while classification needs the global representation of scene segments, which probably leads to the model dominated by one of the two tasks in the training phase. In this paper, from an alternate perspective to overcome the above challenges, we unite these two tasks into one task by a new form of predicting shots link: a link connects two adjacent shots, indicating that they belong to the same scene or category. To the end, we propose a general One Stage Multimodal Sequential Link Framework (OS-MSL) to both distinguish and leverage the two-fold semantics by reforming the two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Vision and Imaging · Human Pose and Action Recognition