Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in   the Real World

Rujie Wu; Xiaojian Ma; Zhenliang Zhang; Wei Wang; Qing Li; Song-Chun; Zhu; Yizhou Wang

arXiv:2310.10207·cs.LG·January 8, 2025·2 cites

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

Rujie Wu, Xiaojian Ma, Zhenliang Zhang, Wei Wang, Qing Li, Song-Chun, Zhu, Yizhou Wang

PDF

Open Access 1 Repo

TL;DR

Bongard-OpenWorld is a challenging new benchmark for evaluating few-shot reasoning on real-world, open-vocabulary visual concepts, highlighting current limitations of AI models compared to human performance.

Contribution

The paper introduces Bongard-OpenWorld, a novel benchmark for open-world, few-shot visual reasoning with real images, and explores various AI approaches, revealing significant gaps with human reasoning.

Findings

01

Current models achieve 64% accuracy, below human 91%.

02

Open-world, free-form concepts significantly challenge existing algorithms.

03

Neuro-symbolic reasoning approaches show potential but still fall short.

Abstract

We introduce Bongard-OpenWorld, a new benchmark for evaluating real-world few-shot reasoning for machine vision. It originates from the classical Bongard Problems (BPs): Given two sets of images (positive and negative), the model needs to identify the set that query images belong to by inducing the visual concepts, which is exclusively depicted by images from the positive set. Our benchmark inherits the few-shot concept induction of the original BPs while adding the two novel layers of challenge: 1) open-world free-form concepts, as the visual concepts in Bongard-OpenWorld are unique compositions of terms from an open vocabulary, ranging from object categories to abstract visual attributes and commonsense factual knowledge; 2) real-world images, as opposed to the synthetic diagrams used by many counterparts. In our exploration, Bongard-OpenWorld already imposes a significant challenge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joyjayng/Bongard-OpenWorld
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Neural Network Applications

MethodsNone