X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering
Jingjing Jiang, Ziyi Liu, Yifan Liu, Zhixiong Nan, and Nanning Zheng

TL;DR
This paper introduces X-GGM, a graph generative modeling approach that enhances out-of-distribution generalization in visual question answering by modeling attribute-object relations and stabilizing training.
Contribution
It proposes a novel graph generative training scheme with a gradient distribution consistency loss for improved OOD generalization in VQA.
Findings
Achieves state-of-the-art OOD performance on VQA-CP v2 and GQA-OOD benchmarks.
Effectively models attribute-object relations through graph generation.
Demonstrates the effectiveness of the proposed components via ablation studies.
Abstract
Encouraging progress has been made towards Visual Question Answering (VQA) in recent years, but it is still challenging to enable VQA models to adaptively generalize to out-of-distribution (OOD) samples. Intuitively, recompositions of existing visual concepts (\ie, attributes and objects) can generate unseen compositions in the training set, which will promote VQA models to generalize to OOD samples. In this paper, we formulate OOD generalization in VQA as a compositional generalization problem and propose a graph generative modeling-based training scheme (X-GGM) to implicitly model the problem. X-GGM leverages graph generative modeling to iteratively generate a relation matrix and node representations for the predefined graph that utilizes attribute-object pairs as nodes. Furthermore, to alleviate the unstable training issue in graph generative modeling, we propose a gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
