GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning
Huseyin Coskun, Alireza Zareian, Joshua L. Moore, Federico, Tombari, Chen Wang

TL;DR
This paper introduces GOCA, a novel clustering approach for self-supervised video representation learning that combines multiple views using guided cluster assignment and regularization to improve robustness and semantic quality.
Contribution
It proposes a new clustering strategy that uses initial cluster assignments as priors to guide multi-view clustering, enhancing semantic consistency and robustness in video representations.
Findings
Outperforms state-of-the-art by 7% on UCF video retrieval
Achieves 5% improvement on UCF video classification
Demonstrates robustness to noisy inputs in multi-view clustering
Abstract
Clustering is a ubiquitous tool in unsupervised learning. Most of the existing self-supervised representation learning methods typically cluster samples based on visually dominant features. While this works well for image-based self-supervision, it often fails for videos, which require understanding motion rather than focusing on background. Using optical flow as complementary information to RGB can alleviate this problem. However, we observe that a naive combination of the two views does not provide meaningful gains. In this paper, we propose a principled way to combine two views. Specifically, we propose a novel clustering strategy where we use the initial cluster assignment of each view as prior to guide the final cluster assignment of the other view. This idea will enforce similar cluster structures for both views, and the formed clusters will be semantically abstract and robust to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
