Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval

Chen Jiang; Kaiming Huang; Sifeng He; Xudong Yang; Wei Zhang; Xiaobo Zhang; Yuan Cheng; Lei Yang; Qing Wang; Furong Xu; Tan Pan; Wei Chu

arXiv:2309.11091·cs.CV·May 20, 2025

Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval

Chen Jiang, Kaiming Huang, Sifeng He, Xudong Yang, Wei Zhang, Xiaobo Zhang, Yuan Cheng, Lei Yang, Qing Wang, Furong Xu, Tan Pan, Wei Chu

PDF

TL;DR

This paper introduces SSAN, an end-to-end trainable network for segment-level video retrieval that improves alignment accuracy and efficiency by using novel keyframe extraction and similarity detection modules.

Contribution

The paper proposes a novel end-to-end trainable network with two new modules, SKE and SPD, enhancing segment alignment accuracy and efficiency in large-scale video retrieval.

Findings

01

SSAN outperforms existing methods in alignment accuracy.

02

SKE reduces storage and computation with minimal accuracy loss.

03

SPD improves temporal localization efficiency.

Abstract

With the explosive growth of web videos in recent years, large-scale Content-Based Video Retrieval (CBVR) becomes increasingly essential in video filtering, recommendation, and copyright protection. Segment-level CBVR (S-CBVR) locates the start and end time of similar segments in finer granularity, which is beneficial for user browsing efficiency and infringement detection especially in long video scenarios. The challenge of S-CBVR task is how to achieve high temporal alignment accuracy with efficient computation and low storage consumption. In this paper, we propose a Segment Similarity and Alignment Network (SSAN) in dealing with the challenge which is firstly trained end-to-end in S-CBVR. SSAN is based on two newly proposed modules in video retrieval: (1) An efficient Self-supervised Keyframe Extraction (SKE) module to reduce redundant frame features, (2) A robust Similarity Pattern…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.