TL;DR
This paper introduces a learnable extension of Optimal Sequential Grouping (OSG) for video scene detection, combining deep learning with robust optimization to improve accuracy and versatility in segmenting videos into meaningful chapters.
Contribution
The work extends OSG into a learnable framework integrated with deep neural networks, enabling end-to-end training for improved video scene detection performance.
Findings
Learnable OSG outperforms previous methods in accuracy.
Integrated models show enhanced robustness and versatility.
Thorough analysis of different configurations demonstrates benefits of learning approach.
Abstract
Video scene detection is the task of dividing videos into temporal semantic chapters. This is an important preliminary step before attempting to analyze heterogeneous video content. Recently, Optimal Sequential Grouping (OSG) was proposed as a powerful unsupervised solution to solve a formulation of the video scene detection problem. In this work, we extend the capabilities of OSG to the learning regime. By giving the capability to both learn from examples and leverage a robust optimization formulation, we can boost performance and enhance the versatility of the technology. We present a comprehensive analysis of incorporating OSG into deep learning neural networks under various configurations. These configurations include learning an embedding in a straight-forward manner, a tailored loss designed to guide the solution of OSG, and an integrated model where the learning is performed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
