T2VIndexer: A Generative Video Indexer for Efficient Text-Video   Retrieval

Yili Li; Jing Yu; Keke Gai; Bang Liu; Gang Xiong; Qi Wu

arXiv:2408.11432·cs.CV·August 22, 2024

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval

Yili Li, Jing Yu, Keke Gai, Bang Liu, Gang Xiong, Qi Wu

PDF

1 Repo

TL;DR

T2VIndexer introduces a generative sequence-to-sequence model for text-video retrieval that significantly reduces retrieval time while maintaining or improving accuracy across multiple datasets.

Contribution

The paper presents a novel generative model-based video indexer that enables constant-time retrieval and introduces encoding and augmentation techniques for semantic video representation.

Findings

01

Achieves 30-50% of original retrieval time with improved accuracy.

02

Enhances retrieval efficiency on four standard datasets.

03

Maintains high retrieval performance with semantic video encoding.

Abstract

Current text-video retrieval methods mainly rely on cross-modal matching between queries and videos to calculate their similarity scores, which are then sorted to obtain retrieval results. This method considers the matching between each candidate video and the query, but it incurs a significant time cost and will increase notably with the increase of candidates. Generative models are common in natural language processing and computer vision, and have been successfully applied in document retrieval, but their application in multimodal retrieval remains unexplored. To enhance retrieval efficiency, in this paper, we introduce a model-based video indexer named T2VIndexer, which is a sequence-to-sequence generative model directly generating video identifiers and retrieving candidate videos with constant time complexity. T2VIndexer aims to reduce retrieval time while maintaining high…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lilidamowang/t2vindexer-generativesearch
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.