Visual Question Answering based on Formal Logic

Muralikrishnna G. Sethuraman; Ali Payani; Faramarz Fekri; J. Clayton; Kerce

arXiv:2111.04785·cs.CV·November 11, 2021

Visual Question Answering based on Formal Logic

Muralikrishnna G. Sethuraman, Ali Payani, Faramarz Fekri, J. Clayton, Kerce

PDF

TL;DR

This paper introduces a formal logic-based approach to visual question answering that converts images and questions into symbolic representations for explicit reasoning, achieving high accuracy and interpretability.

Contribution

It presents a novel framework combining scene graphs and transformer-based translation to first-order logic for VQA, demonstrating high accuracy and interpretability.

Findings

01

Achieves 99.6% accuracy on CLEVR dataset.

02

Highly interpretable reasoning process.

03

Data-efficient, with 99.1% accuracy using only 10% of training data.

Abstract

Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In VQA, a series of questions are posed based on a set of images and the task at hand is to arrive at the answer. To achieve this, we take a symbolic reasoning based approach using the framework of formal logic. The image and the questions are converted into symbolic representations on which explicit reasoning is performed. We propose a formal logic framework where (i) images are converted to logical background facts with the help of scene graphs, (ii) the questions are translated to first-order predicate logic clauses using a transformer based deep learning model, and (iii) perform satisfiability checks, by using the background knowledge and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.