GOCA: Guided Online Cluster Assignment for Self-Supervised Video   Representation Learning

Huseyin Coskun; Alireza Zareian; Joshua L. Moore; Federico; Tombari; Chen Wang

arXiv:2207.10158·cs.CV·July 22, 2022

GOCA: Guided Online Cluster Assignment for Self-Supervised Video Representation Learning

Huseyin Coskun, Alireza Zareian, Joshua L. Moore, Federico, Tombari, Chen Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces GOCA, a novel clustering approach for self-supervised video representation learning that combines multiple views using guided cluster assignment and regularization to improve robustness and semantic quality.

Contribution

It proposes a new clustering strategy that uses initial cluster assignments as priors to guide multi-view clustering, enhancing semantic consistency and robustness in video representations.

Findings

01

Outperforms state-of-the-art by 7% on UCF video retrieval

02

Achieves 5% improvement on UCF video classification

03

Demonstrates robustness to noisy inputs in multi-view clustering

Abstract

Clustering is a ubiquitous tool in unsupervised learning. Most of the existing self-supervised representation learning methods typically cluster samples based on visually dominant features. While this works well for image-based self-supervision, it often fails for videos, which require understanding motion rather than focusing on background. Using optical flow as complementary information to RGB can alleviate this problem. However, we observe that a naive combination of the two views does not provide meaningful gains. In this paper, we propose a principled way to combine two views. Specifically, we propose a novel clustering strategy where we use the initial cluster assignment of each view as prior to guide the final cluster assignment of the other view. This idea will enforce similar cluster structures for both views, and the formed clusters will be semantically abstract and robust to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seleucia/goca
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications