DARA: Domain- and Relation-aware Adapters Make Parameter-efficient   Tuning for Visual Grounding

Ting Liu; Xuyang Liu; Siteng Huang; Honggang Chen; Quanjun Yin; Long; Qin; Donglin Wang; Yue Hu

arXiv:2405.06217·cs.CV·June 11, 2024

DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding

Ting Liu, Xuyang Liu, Siteng Huang, Honggang Chen, Quanjun Yin, Long, Qin, Donglin Wang, Yue Hu

PDF

Open Access 1 Repo

TL;DR

DARA introduces domain- and relation-aware adapters for visual grounding, enabling efficient transfer learning with minimal parameter updates while achieving state-of-the-art accuracy on benchmarks.

Contribution

The paper proposes DARA, a novel PETL method with domain- and relation-aware adapters, significantly reducing fine-tuning parameters while improving accuracy in visual grounding tasks.

Findings

01

Achieves best accuracy with only 2.13% of backbone parameters tuned.

02

Outperforms full fine-tuning and other PETL methods on benchmarks.

03

Improves spatial reasoning and domain adaptation in visual grounding.

Abstract

Visual grounding (VG) is a challenging task to localize an object in an image based on a textual description. Recent surge in the scale of VG models has substantially improved performance, but also introduced a significant burden on computational costs during fine-tuning. In this paper, we explore applying parameter-efficient transfer learning (PETL) to efficiently transfer the pre-trained vision-language knowledge to VG. Specifically, we propose \textbf{DARA}, a novel PETL method comprising \underline{\textbf{D}}omain-aware \underline{\textbf{A}}dapters (DA Adapters) and \underline{\textbf{R}}elation-aware \underline{\textbf{A}}dapters (RA Adapters) for VG. DA Adapters first transfer intra-modality representations to be more fine-grained for the VG domain. Then RA Adapters share weights to bridge the relation between two modalities, improving spatial reasoning. Empirical results on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuting20/dara
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Vision and Imaging