Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols
Gertjan Burghouts, Fieke Hillerstr\"om, Erwin Walraven, Michael van, Bekkum, Frank Ruis, Joris Sijs, Jelle van Mil, Judith Dijk

TL;DR
This paper introduces a neuro-symbolic approach combining logic-based reasoning and language-vision models to identify spatial configurations of objects in images, enabling open-world visual reasoning for tasks like locating tools and pipes.
Contribution
It is the first work to integrate neuro-symbolic programming with language-vision models for open-world spatial reasoning in images.
Findings
Effective in localizing objects like tools and pipes in complex scenes.
Most errors stem from biases in language-vision models.
Demonstrates potential for reasoning in open-world visual tasks.
Abstract
We consider the problem of finding spatial configurations of multiple objects in images, e.g., a mobile inspection robot is tasked to localize abandoned tools on the floor. We define the spatial configuration of objects by first-order logic in terms of relations and attributes. A neuro-symbolic program matches the logic formulas to probabilistic object proposals for the given image, provided by language-vision models by querying them for the symbols. This work is the first to combine neuro-symbolic programming (reasoning) and language-vision models (learning) to find spatial configurations of objects in images in an open world setting. We show the effectiveness by finding abandoned tools on floors and leaking pipes. We find that most prediction errors are due to biases in the language-vision model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Fuzzy Logic and Control Systems · AI-based Problem Solving and Planning
