VQS: Linking Segmentations to Questions and Answers for Supervised   Attention in VQA and Question-Focused Semantic Segmentation

Chuang Gan; Yandong Li; Haoxiang Li; Chen Sun; Boqing Gong

arXiv:1708.04686·cs.CV·August 17, 2017·26 cites

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong

PDF

Open Access 1 Repo

TL;DR

This paper introduces VQS, a dataset linking COCO segmentations with VQA questions and answers, enabling supervised attention in VQA and a new question-focused segmentation task, leading to improved performance.

Contribution

It creates the VQS dataset linking segmentation and QA annotations, facilitating supervised attention and new research directions in vision-language tasks.

Findings

01

Achieved state-of-the-art results on VQA using segmentation-based attention.

02

Demonstrated the effectiveness of explicit supervision from linked annotations.

03

Explored methods for question-focused semantic segmentation with promising results.

Abstract

Rich and dense human labeled datasets are among the main enabling factors for the recent advance on vision-language understanding. Many seemingly distant annotations (e.g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e.g., of COCO). The popularity of COCO correlates those annotations and tasks. Explicitly linking them up may significantly benefit both individual tasks and the unified vision and language modeling. We present the preliminary work of linking the instance segmentations provided by COCO to the questions and answers (QAs) in the VQA dataset, and name the collected links visual questions and segmentation answers (VQS). They transfer human supervision between the previously separate tasks, offer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Cold-Winter/vqs
caffe2Official

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques