Compressing And Debiasing Vision-Language Pre-Trained Models for Visual   Question Answering

Qingyi Si; Yuanxin Liu; Zheng Lin; Peng Fu; Weiping Wang

arXiv:2210.14558·cs.CV·October 13, 2023·1 cites

Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

Qingyi Si, Yuanxin Liu, Zheng Lin, Peng Fu, Weiping Wang

PDF

Open Access 1 Repo

TL;DR

This paper explores the joint compression and debiasing of vision-language pre-trained models for visual question answering, demonstrating the existence of sparse, robust subnetworks that outperform debiased full models on out-of-distribution datasets.

Contribution

It introduces a systematic approach to simultaneously compress and debias VLPs by searching for sparse, robust subnetworks tailored for VQA tasks.

Findings

01

Existence of sparse, robust subnetworks in VLPs.

02

Sparse subnetworks outperform debiased full models on OOD datasets.

03

Proposed method achieves competitive results with fewer parameters.

Abstract

Despite the excellent performance of vision-language pre-trained models (VLPs) on conventional VQA task, they still suffer from two problems: First, VLPs tend to rely on language biases in datasets and fail to generalize to out-of-distribution (OOD) data. Second, they are inefficient in terms of memory footprint and computation. Although promising progress has been made in both problems, most existing works tackle them independently. To facilitate the application of VLP to VQA tasks, it is imperative to jointly study VLP compression and OOD robustness, which, however, has not yet been explored. This paper investigates whether a VLP can be compressed and debiased simultaneously by searching sparse and robust subnetworks. To this end, we systematically study the design of a training and compression pipeline to search the subnetworks, as well as the assignment of sparsity to different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phoebussi/compress-robust-vqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

Methodsfail · Learning Cross-Modality Encoder Representations from Transformers