Explicit Reasoning over End-to-End Neural Architectures for Visual   Question Answering

Somak Aditya; Yezhou Yang; Chitta Baral

arXiv:1803.08896·cs.CV·March 26, 2018

Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering

Somak Aditya, Yezhou Yang, Chitta Baral

PDF

TL;DR

This paper introduces an explicit reasoning layer using Probabilistic Soft Logic to enhance visual question answering systems, making them more interpretable and capable of handling questions requiring external knowledge.

Contribution

It presents a novel reasoning layer that integrates neural network outputs with background knowledge for improved, interpretable VQA performance.

Findings

01

The reasoning layer improves answer accuracy on VQA tasks.

02

It provides interpretable explanations for the answers.

03

The approach effectively incorporates external knowledge sources.

Abstract

Many vision and language tasks require commonsense reasoning beyond data-driven image and natural language processing. Here we adopt Visual Question Answering (VQA) as an example task, where a system is expected to answer a question in natural language about an image. Current state-of-the-art systems attempted to solve the task using deep neural architectures and achieved promising performance. However, the resulting systems are generally opaque and they struggle in understanding questions for which extra knowledge is required. In this paper, we present an explicit reasoning layer on top of a set of penultimate neural network based systems. The reasoning layer enables reasoning and answering questions where additional knowledge is required, and at the same time provides an interpretable interface to the end users. Specifically, the reasoning layer adopts a Probabilistic Soft Logic (PSL)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.