Multi-modal Domain Adaptation for REG via Relation Transfer
Yifan Ding, Liqiang Wang, Boqing Gong

TL;DR
This paper introduces a relation transfer method for multi-modal domain adaptation in Referring Expression Grounding, effectively improving cross-domain performance without relying on large-scale pre-training.
Contribution
It proposes a novel relation-tailored approach that enriches and transfers inter-domain relations specifically for multi-modal REG tasks.
Findings
Significant improvement in domain transferability for REG
Enhanced adaptation performance demonstrated in experiments
Effective relation transfer without large-scale pre-training
Abstract
Domain adaptation, which aims to transfer knowledge between domains, has been well studied in many areas such as image classification and object detection. However, for multi-modal tasks, conventional approaches rely on large-scale pre-training. But due to the difficulty of acquiring multi-modal data, large-scale pre-training is often impractical. Therefore, domain adaptation, which can efficiently utilize the knowledge from different datasets (domains), is crucial for multi-modal tasks. In this paper, we focus on the Referring Expression Grounding (REG) task, which is to localize an image region described by a natural language expression. Specifically, we propose a novel approach to effectively transfer multi-modal knowledge through a specially relation-tailored approach for the REG problem. Our approach tackles the multi-modal domain adaptation problem by simultaneously enriching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
MethodsFocus
