Bridging the Gap: A Unified Video Comprehension Framework for Moment   Retrieval and Highlight Detection

Yicheng Xiao; Zhuoyan Luo; Yong Liu; Yue Ma; Hengwei Bian; Yatai Ji,; Yujiu Yang; Xiu Li

arXiv:2311.16464·cs.CV·November 29, 2023·1 cites

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

Yicheng Xiao, Zhuoyan Luo, Yong Liu, Yue Ma, Hengwei Bian, Yatai Ji,, Yujiu Yang, Xiu Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces UVCOM, a unified framework that effectively addresses both Video Moment Retrieval and Highlight Detection by integrating local and global video understanding through multi-aspect contrastive learning.

Contribution

The paper proposes a novel unified framework, UVCOM, that jointly tackles MR and HD with task-specific design and multi-granularity integration, outperforming existing methods.

Findings

01

UVCOM outperforms state-of-the-art methods on multiple datasets.

02

Multi-aspect contrastive learning enhances local and global video understanding.

03

Task-specific design improves the effectiveness of joint MR and HD.

Abstract

Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted significant attention due to the growing demand for video analysis. Recent approaches treat MR and HD as similar video grounding problems and address them together with transformer-based architecture. However, we observe that the emphasis of MR and HD differs, with one necessitating the perception of local relationships and the other prioritizing the understanding of global contexts. Consequently, the lack of task-specific design will inevitably lead to limitations in associating the intrinsic specialty of two tasks. To tackle the issue, we propose a Unified Video COMprehension framework (UVCOM) to bridge the gap and jointly solve MR and HD effectively. By performing progressive integration on intra and inter-modality across multi-granularity, UVCOM achieves the comprehensive understanding in processing a video.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

easonxiao-888/uvcom
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsContrastive Learning