SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based   Question Answering

Bruno Souza; Marius Aasan; Helio Pedrini; Ad\'in Ram\'irez; Rivera

arXiv:2310.01842·cs.CV·October 4, 2023

SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering

Bruno Souza, Marius Aasan, Helio Pedrini, Ad\'in Ram\'irez, Rivera

PDF

Open Access

TL;DR

This paper introduces SelfGraphVQA, a self-supervised framework that enhances scene graph-based visual question answering by using augmentation and contrastive learning to improve generalization without relying on annotated scene graphs.

Contribution

The paper proposes a novel self-supervised approach for scene graph extraction and augmentation in VQA, reducing reliance on annotated data and improving reasoning capabilities.

Findings

01

SelfGraphVQA improves VQA accuracy using self-supervised graph learning.

02

Augmentation strategies enhance the robustness of scene graph representations.

03

Contrastive learning boosts the effectiveness of scene graph features in VQA.

Abstract

The intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have emerged as a useful tool for multimodal image analysis, showing impressive performance in tasks such as Visual Question Answering (VQA). In this work, we demonstrate that despite the effectiveness of scene graphs in VQA tasks, current methods that utilize idealized annotated scene graphs struggle to generalize when using predicted scene graphs extracted from images. To address this issue, we introduce the SelfGraphVQA framework. Our approach extracts a scene graph from an input image using a pre-trained scene graph generator and employs semantically-preserving augmentation with self-supervised techniques. This method improves the utilization of graph representations in VQA tasks by circumventing the need for costly and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsFocus