Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World
Rujie Wu, Xiaojian Ma, Zhenliang Zhang, Wei Wang, Qing Li, Song-Chun, Zhu, Yizhou Wang

TL;DR
Bongard-OpenWorld is a challenging new benchmark for evaluating few-shot reasoning on real-world, open-vocabulary visual concepts, highlighting current limitations of AI models compared to human performance.
Contribution
The paper introduces Bongard-OpenWorld, a novel benchmark for open-world, few-shot visual reasoning with real images, and explores various AI approaches, revealing significant gaps with human reasoning.
Findings
Current models achieve 64% accuracy, below human 91%.
Open-world, free-form concepts significantly challenge existing algorithms.
Neuro-symbolic reasoning approaches show potential but still fall short.
Abstract
We introduce Bongard-OpenWorld, a new benchmark for evaluating real-world few-shot reasoning for machine vision. It originates from the classical Bongard Problems (BPs): Given two sets of images (positive and negative), the model needs to identify the set that query images belong to by inducing the visual concepts, which is exclusively depicted by images from the positive set. Our benchmark inherits the few-shot concept induction of the original BPs while adding the two novel layers of challenge: 1) open-world free-form concepts, as the visual concepts in Bongard-OpenWorld are unique compositions of terms from an open vocabulary, ranging from object categories to abstract visual attributes and commonsense factual knowledge; 2) real-world images, as opposed to the synthetic diagrams used by many counterparts. In our exploration, Bongard-OpenWorld already imposes a significant challenge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Neural Network Applications
MethodsNone
