CS-VQA: Visual Question Answering with Compressively Sensed Images

Li-Chi Huang; Kuldeep Kulkarni; Anik Jha; Suhas Lohit; Suren; Jayasuriya; Pavan Turaga

arXiv:1806.03379·cs.CV·June 12, 2018

CS-VQA: Visual Question Answering with Compressively Sensed Images

Li-Chi Huang, Kuldeep Kulkarni, Anik Jha, Suhas Lohit, Suren, Jayasuriya, Pavan Turaga

PDF

Open Access

TL;DR

This paper demonstrates that visual question answering can be effectively performed directly on compressively sensed images with minimal accuracy loss, enabling resource-efficient applications.

Contribution

It introduces deep-network architectures that perform VQA directly on compressive measurements, showing VQA is feasible without full image reconstruction.

Findings

01

VQA performance degrades minimally with compressive sensing

02

Deep networks can exploit compressive data for accurate VQA

03

Reconstruction-based VQA restores accuracy effectively

Abstract

Visual Question Answering (VQA) is a complex semantic task requiring both natural language processing and visual recognition. In this paper, we explore whether VQA is solvable when images are captured in a sub-Nyquist compressive paradigm. We develop a series of deep-network architectures that exploit available compressive data to increasing degrees of accuracy, and show that VQA is indeed solvable in the compressed domain. Our results show that there is nominal degradation in VQA performance when using compressive measurements, but that accuracy can be recovered when VQA pipelines are used in conjunction with state-of-the-art deep neural networks for CS reconstruction. The results presented yield important implications for resource-constrained VQA applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning