Probing the Mid-level Vision Capabilities of Self-Supervised Learning

Xuweiyi Chen; Markus Marks; Zezhou Cheng

arXiv:2411.17474·cs.CV·December 17, 2024

Probing the Mid-level Vision Capabilities of Self-Supervised Learning

Xuweiyi Chen, Markus Marks, Zezhou Cheng

PDF

Open Access

TL;DR

This paper systematically evaluates the mid-level vision capabilities of 22 self-supervised learning models across 8 tasks, revealing insights into their strengths, weaknesses, and factors influencing performance, which guides future SSL research.

Contribution

Introduces benchmark protocols and conducts a comprehensive evaluation of SSL models on mid-level vision tasks, highlighting the gap with high-level task performance and factors affecting capabilities.

Findings

01

Weak correlation between mid-level and high-level task performance

02

Some SSL models excel in both mid-level and high-level tasks

03

Pretraining objectives and architectures significantly influence mid-level vision capabilities

Abstract

Mid-level vision capabilities - such as generic object localization and 3D geometric understanding - are not only fundamental to human vision but are also crucial for many real-world applications of computer vision. These abilities emerge with minimal supervision during the early stages of human visual development. Despite their significance, current self-supervised learning (SSL) approaches are primarily designed and evaluated for high-level recognition tasks, leaving their mid-level vision capabilities largely unexamined. In this study, we introduce a suite of benchmark protocols to systematically assess mid-level vision capabilities and present a comprehensive, controlled evaluation of 22 prominent SSL models across 8 mid-level vision tasks. Our experiments reveal a weak correlation between mid-level and high-level task performance. We also identify several SSL methods with highly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOnline and Blended Learning