Online Deep Clustering with Video Track Consistency
Alessandra Alfani, Federico Becattini, Lorenzo Seidenari, Alberto Del, Bimbo

TL;DR
This paper introduces ODCT, an unsupervised deep clustering method that leverages video track consistency to improve visual feature learning, especially under viewpoint changes, outperforming prior methods that ignore temporal information.
Contribution
The paper proposes a novel unsupervised clustering approach using video track information to enhance feature learning, reducing reliance on precise annotations and handling viewpoint variations.
Findings
ODCT outperforms prior methods on multiple downstream tasks.
Using noisy, class-agnostic tracks improves accuracy over precise track annotations.
Leveraging temporal information enhances visual feature robustness.
Abstract
Several unsupervised and self-supervised approaches have been developed in recent years to learn visual features from large-scale unlabeled datasets. Their main drawback however is that these methods are hardly able to recognize visual features of the same object if it is simply rotated or the perspective of the camera changes. To overcome this limitation and at the same time exploit a useful source of supervision, we take into account video object tracks. Following the intuition that two patches in a track should have similar visual representations in a learned feature space, we adopt an unsupervised clustering-based approach and constrain such representations to be labeled as the same category since they likely belong to the same object or object part. Experimental results on two downstream tasks on different datasets demonstrate the effectiveness of our Online Deep Clustering with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
