Barlow constrained optimization for Visual Question Answering

Abhishek Jha; Badri N. Patro; Luc Van Gool; Tinne Tuytelaars

arXiv:2203.03727·cs.CV·March 9, 2022

Barlow constrained optimization for Visual Question Answering

Abhishek Jha, Badri N. Patro, Luc Van Gool, Tinne Tuytelaars

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces COB, a regularization method for VQA that reduces redundancy in the joint embedding space, improving accuracy and interpretability by disentangling semantic concepts.

Contribution

It proposes a novel constrained optimization regularization based on Barlow's theory to enhance the information content of the VQA joint space.

Findings

01

Improves VQA accuracy by 1.4% on VQA-CP v2

02

Enhances interpretability of the model

03

Reduces redundancy in the joint embedding space

Abstract

Visual question answering is a vision-and-language multimodal task, that aims at predicting answers given samples from the question and image modalities. Most recent methods focus on learning a good joint embedding space of images and questions, either by improving the interaction between these two modalities, or by making it a more discriminant space. However, how informative this joint space is, has not been well explored. In this paper, we propose a novel regularization for VQA models, Constrained Optimization using Barlow's theory (COB), that improves the information content of the joint space by minimizing the redundancy. It reduces the correlation between the learned feature components and thereby disentangles semantic concepts. Our model also aligns the joint space with the answer embedding space, where we consider the answer and image+question as two different `views' of what in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

abskjha/Barlow-constrained-VQA
pytorchOfficial

Videos

Barlow constrained optimization for Visual Question Answering· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning