X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization   in Visual Question Answering

Jingjing Jiang; Ziyi Liu; Yifan Liu; Zhixiong Nan; and Nanning Zheng

arXiv:2107.11576·cs.CV·October 5, 2021

X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering

Jingjing Jiang, Ziyi Liu, Yifan Liu, Zhixiong Nan, and Nanning Zheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces X-GGM, a graph generative modeling approach that enhances out-of-distribution generalization in visual question answering by modeling attribute-object relations and stabilizing training.

Contribution

It proposes a novel graph generative training scheme with a gradient distribution consistency loss for improved OOD generalization in VQA.

Findings

01

Achieves state-of-the-art OOD performance on VQA-CP v2 and GQA-OOD benchmarks.

02

Effectively models attribute-object relations through graph generation.

03

Demonstrates the effectiveness of the proposed components via ablation studies.

Abstract

Encouraging progress has been made towards Visual Question Answering (VQA) in recent years, but it is still challenging to enable VQA models to adaptively generalize to out-of-distribution (OOD) samples. Intuitively, recompositions of existing visual concepts (\ie, attributes and objects) can generate unseen compositions in the training set, which will promote VQA models to generalize to OOD samples. In this paper, we formulate OOD generalization in VQA as a compositional generalization problem and propose a graph generative modeling-based training scheme (X-GGM) to implicitly model the problem. X-GGM leverages graph generative modeling to iteratively generate a relation matrix and node representations for the predefined graph that utilizes attribute-object pairs as nodes. Furthermore, to alleviate the unstable training issue in graph generative modeling, we propose a gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jingjing12110/x-ggm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques