Dynamic Adapter with Semantics Disentangling for Cross-lingual   Cross-modal Retrieval

Rui Cai; Zhiyu Dong; Jianfeng Dong; Xun Wang

arXiv:2412.13510·cs.CV·December 19, 2024

Dynamic Adapter with Semantics Disentangling for Cross-lingual Cross-modal Retrieval

Rui Cai, Zhiyu Dong, Jianfeng Dong, Xun Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DASD, a dynamic adapter framework with semantics disentangling, to improve cross-lingual cross-modal retrieval by adapting to varied caption expressions without target-language labeled data.

Contribution

The paper proposes a novel dynamic adapter with semantics disentangling that adapts to input caption characteristics, enhancing cross-lingual cross-modal retrieval for low-resource languages.

Findings

01

Effective on multiple datasets for image-text and video-text retrieval.

02

Compatible with various vision-language pretraining models.

03

Improves retrieval performance without target-language annotations.

Abstract

Existing cross-modal retrieval methods typically rely on large-scale vision-language pair data. This makes it challenging to efficiently develop a cross-modal retrieval model for under-resourced languages of interest. Therefore, Cross-lingual Cross-modal Retrieval (CCR), which aims to align vision and the low-resource language (the target language) without using any human-labeled target-language data, has gained increasing attention. As a general parameter-efficient way, a common solution is to utilize adapter modules to transfer the vision-language alignment ability of Vision-Language Pretraining (VLP) models from a source language to a target language. However, these adapters are usually static once learned, making it difficult to adapt to target-language captions with varied expressions. To alleviate it, we propose Dynamic Adapter with Semantics Disentangling (DASD), whose parameters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huiguanlab/dasd
pytorchOfficial

Videos

Dynamic Adapter with Semantics Disentangling for Cross-lingual Cross-modal Retrieval· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsAdapter · ALIGN