Benchmarking Omni-Vision Representation through the Lens of Visual Realms
Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu

TL;DR
This paper introduces OmniBenchmark, a comprehensive dataset covering diverse visual realms, and proposes ReCo, a contrastive learning method that improves omni-vision representations by leveraging semantic relations.
Contribution
The paper presents a new benchmark dataset for evaluating omni-vision models and a novel contrastive learning framework that encodes semantic relations to enhance generalization.
Findings
ReCo outperforms other supervised contrastive methods.
OmniBenchmark covers most visual realms without semantic overlap.
ReCo improves omni-vision representations across architectures.
Abstract
Though impressive performance has been achieved in specific visual realms (e.g. faces, dogs, and places), an omni-vision representation generalizing to many natural visual domains is highly desirable. But, existing benchmarks are biased and inefficient to evaluate the omni-vision representation -- these benchmarks either only include several specific realms, or cover most realms at the expense of subsuming numerous datasets that have extensive realm overlapping. In this paper, we propose Omni-Realm Benchmark (OmniBenchmark). It includes 21 realm-wise datasets with 7,372 concepts and 1,074,346 images. Without semantic overlapping, these datasets cover most visual realms comprehensively and meanwhile efficiently. In addition, we propose a new supervised contrastive learning framework, namely Relational Contrastive learning (ReCo), for a better omni-vision representation. Beyond pulling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods
MethodsContrastive Learning
