Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
Zhiwu Qing, Shiwei Zhang, Ziyuan Huang, Yi Xu, Xiang Wang, Mingqian, Tang, Changxin Gao, Rong Jin, Nong Sang

TL;DR
This paper introduces HiCo, a hierarchical consistency learning framework that leverages untrimmed videos by capturing visual and topical consistencies, leading to improved video representations over traditional methods.
Contribution
The paper proposes a novel hierarchical consistency learning framework, HiCo, that effectively utilizes untrimmed videos for self-supervised representation learning, surpassing existing trimmed-video-based approaches.
Findings
HiCo produces stronger video representations from untrimmed videos.
It improves representation quality when applied to trimmed videos.
Hierarchical consistency learning outperforms standard contrastive methods.
Abstract
Natural videos provide rich visual contents for self-supervised learning. Yet most existing approaches for learning spatio-temporal representations rely on manually trimmed videos, leading to limited diversity in visual patterns and limited performance gain. In this work, we aim to learn representations by leveraging more abundant information in untrimmed videos. To this end, we propose to learn a hierarchy of consistencies in videos, i.e., visual consistency and topical consistency, corresponding respectively to clip pairs that tend to be visually similar when separated by a short time span and share similar topics when separated by a long time span. Specifically, a hierarchical consistency learning framework HiCo is presented, where the visually consistent pairs are encouraged to have the same representation through contrastive learning, while the topically consistent pairs are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research · Multimodal Machine Learning Applications
MethodsContrastive Learning · Contrastive Language-Image Pre-training
