Understand, Compose and Respond - Answering Visual Questions by a Composition of Abstract Procedures
Ben Zion Vatashsky, Shimon Ullman

TL;DR
This paper introduces UnCoRd, a modular system that answers visual questions by decomposing them into structured, compositional procedures based on their abstract logical patterns, enabling flexible reasoning.
Contribution
It presents a novel approach that models questions as compositional logical patterns, allowing modular integration of visual detection and external knowledge for improved reasoning.
Findings
Demonstrates the system's ability to represent complex questions.
Shows qualitative analysis of UnCoRd's reasoning capabilities.
Provides a framework for future development in visual question answering.
Abstract
An image related question defines a specific visual task that is required in order to produce an appropriate answer. The answer may depend on a minor detail in the image and require complex reasoning and use of prior knowledge. When humans perform this task, they are able to do it in a flexible and robust manner, integrating modularly any novel visual capability with diverse options for various elaborations of the task. In contrast, current approaches to solve this problem by a machine are based on casting the problem as an end-to-end learning problem, which lacks such abilities. We present a different approach, inspired by the aforementioned human capabilities. The approach is based on the compositional structure of the question. The underlying idea is that a question has an abstract representation based on its structure, which is compositional in nature. The question can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Topic Modeling
