Separating Skills and Concepts for Novel Visual Question Answering
Spencer Whitehead, Hui Wu, Heng Ji, Rogerio Feris, Kate Saenko

TL;DR
This paper introduces a method to improve visual question answering models by explicitly separating and composing skills and concepts, enhancing their ability to generalize to new, unseen question compositions.
Contribution
It proposes a novel contrastive learning approach that disentangles skills and concepts without external annotations, improving compositional generalization in VQA.
Findings
Enhanced performance on compositional generalization tasks
Effective grounding of concepts without external labels
Improved ability to handle novel question compositions
Abstract
Generalization to out-of-distribution data has been a problem for Visual Question Answering (VQA) models. To measure generalization to novel questions, we propose to separate them into "skills" and "concepts". "Skills" are visual tasks, such as counting or attribute recognition, and are applied to "concepts" mentioned in the question, such as objects and people. VQA methods should be able to compose skills and concepts in novel ways, regardless of whether the specific composition has been seen in training, yet we demonstrate that existing models have much to improve upon towards handling new compositions. We present a novel method for learning to compose skills and concepts that separates these two factors implicitly within a model by learning grounded concept representations and disentangling the encoding of skills from that of concepts. We enforce these properties with a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsContrastive Learning
