Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Xuweiyi Chen, Markus Marks, Zezhou Cheng

TL;DR
This paper systematically evaluates the mid-level vision capabilities of 22 self-supervised learning models across 8 tasks, revealing insights into their strengths, weaknesses, and factors influencing performance, which guides future SSL research.
Contribution
Introduces benchmark protocols and conducts a comprehensive evaluation of SSL models on mid-level vision tasks, highlighting the gap with high-level task performance and factors affecting capabilities.
Findings
Weak correlation between mid-level and high-level task performance
Some SSL models excel in both mid-level and high-level tasks
Pretraining objectives and architectures significantly influence mid-level vision capabilities
Abstract
Mid-level vision capabilities - such as generic object localization and 3D geometric understanding - are not only fundamental to human vision but are also crucial for many real-world applications of computer vision. These abilities emerge with minimal supervision during the early stages of human visual development. Despite their significance, current self-supervised learning (SSL) approaches are primarily designed and evaluated for high-level recognition tasks, leaving their mid-level vision capabilities largely unexamined. In this study, we introduce a suite of benchmark protocols to systematically assess mid-level vision capabilities and present a comprehensive, controlled evaluation of 22 prominent SSL models across 8 mid-level vision tasks. Our experiments reveal a weak correlation between mid-level and high-level task performance. We also identify several SSL methods with highly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline and Blended Learning
