DeVLBert: Learning Deconfounded Visio-Linguistic Representations

Shengyu Zhang; Tan Jiang; Tan Wang; Kun Kuang; Zhou Zhao; Jianke Zhu,; Jin Yu; Hongxia Yang; Fei Wu

arXiv:2008.06884·cs.CV·October 5, 2020

DeVLBert: Learning Deconfounded Visio-Linguistic Representations

Shengyu Zhang, Tan Jiang, Tan Wang, Kun Kuang, Zhou Zhao, Jianke Zhu,, Jin Yu, Hongxia Yang, Fei Wu

PDF

1 Repo

TL;DR

DeVLBert introduces a causality-inspired framework for visio-linguistic pretraining that reduces dataset bias and improves out-of-domain generalization across multiple vision-language tasks.

Contribution

The paper presents a novel deconfounded pretraining method for visio-linguistic models using intervention-based learning and backdoor adjustment techniques.

Findings

01

Improved performance on Image Retrieval and Zero-shot IR tasks.

02

Enhanced generalization in Visual Question Answering.

03

Effective mitigation of dataset bias in visio-linguistic pretraining.

Abstract

In this paper, we propose to investigate the problem of out-of-domain visio-linguistic pretraining, where the pretraining data distribution differs from that of downstream data on which the pretrained model will be fine-tuned. Existing methods for this problem are purely likelihood-based, leading to the spurious correlations and hurt the generalization ability when transferred to out-of-domain downstream tasks. By spurious correlation, we mean that the conditional probability of one token (object or word) given another one can be high (due to the dataset biases) without robust (causal) relationships between them. To mitigate such dataset biases, we propose a Deconfounded Visio-Linguistic Bert framework, abbreviated as DeVLBert, to perform intervention-based learning. We borrow the idea of the backdoor adjustment from the research field of causality and propose several neural-network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shengyuzhang/DeVLBert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Multi-Head Attention · Layer Normalization · Attention Is All You Need · Dropout · Residual Connection · Attention Dropout · Weight Decay · Softmax · WordPiece