Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable Visual Question Answering
Heather Riley, Mohan Sridharan

TL;DR
This paper introduces a hybrid architecture combining deep learning with non-monotonic logical reasoning to improve explainability and accuracy in visual question answering, especially with limited data and unknown constraints.
Contribution
It proposes a novel integration of logical reasoning and deep learning for explainable VQA, enabling incremental learning and better handling of incomplete domain knowledge.
Findings
Better accuracy with small datasets compared to end-to-end deep networks
Comparable accuracy with larger datasets
Enhanced reasoning and planning capabilities for robots
Abstract
State of the art algorithms for many pattern recognition problems rely on deep network models. Training these models requires a large labeled dataset and considerable computational resources. Also, it is difficult to understand the working of these learned models, limiting their use in some critical applications. Towards addressing these limitations, our architecture draws inspiration from research in cognitive systems, and integrates the principles of commonsense logical reasoning, inductive learning, and deep learning. In the context of answering explanatory questions about scenes and the underlying classification problems, the architecture uses deep networks for extracting features from images and for generating answers to queries. Between these deep networks, it embeds components for non-monotonic logical reasoning with incomplete commonsense domain knowledge, and for decision tree…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
