Object-based reasoning in VQA

Mikyas T. Desta; Larry Chen; Tomasz Kornuta

arXiv:1801.09718·cs.CV·January 31, 2018

Object-based reasoning in VQA

Mikyas T. Desta, Larry Chen, Tomasz Kornuta

PDF

TL;DR

This paper introduces an object-based reasoning approach for Visual Question Answering, combining object detection and reasoning modules, leading to improved accuracy on complex counting tasks in the CLEVR dataset.

Contribution

It presents a novel integration of object detection with reasoning modules for VQA, demonstrating improved performance on complex counting questions.

Findings

01

Achieved significant accuracy improvements on CLEVR counting tasks.

02

Validated the effectiveness of object-based reasoning in VQA.

03

Showed that high-level object facts facilitate complex reasoning.

Abstract

Visual Question Answering (VQA) is a novel problem domain where multi-modal inputs must be processed in order to solve the task given in the form of a natural language. As the solutions inherently require to combine visual and natural language processing with abstract reasoning, the problem is considered as AI-complete. Recent advances indicate that using high-level, abstract facts extracted from the inputs might facilitate reasoning. Following that direction we decided to develop a solution combining state-of-the-art object detection and reasoning modules. The results, achieved on the well-balanced CLEVR dataset, confirm the promises and show significant, few percent improvements of accuracy on the complex "counting" task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.