Multi-Task Domain Adaptation for Language Grounding with 3D Objects
Penglei Sun, Yaoxian Song, Xinglin Pan, Peijie Dong, Xiaofei Yang,, Qiang Wang, Zhixu Li, Tiefeng Li, and Xiaowen Chu

TL;DR
This paper introduces DA4LG, a novel domain adaptation method for language grounding with 3D objects that improves cross-modal alignment and achieves state-of-the-art accuracy in various settings.
Contribution
The paper proposes a visual adapter with multi-task learning for vision-language alignment, addressing cross-domain challenges in 3D object language grounding.
Findings
Achieves 83.8% accuracy in single-view setting
Achieves 86.8% accuracy in multi-view setting
Demonstrates generalized performance across visual and non-visual descriptions
Abstract
The existing works on object-level language grounding with 3D objects mostly focus on improving performance by utilizing the off-the-shelf pre-trained models to capture features, such as viewpoint selection or geometric priors. However, they have failed to consider exploring the cross-modal representation of language-vision alignment in the cross-domain field. To answer this problem, we propose a novel method called Domain Adaptation for Language Grounding (DA4LG) with 3D objects. Specifically, the proposed DA4LG consists of a visual adapter module with multi-task learning to realize vision-language alignment by comprehensive multimodal feature representation. Experimental results demonstrate that DA4LG competitively performs across visual and non-visual language descriptions, independent of the completeness of observation. DA4LG achieves state-of-the-art performance in the single-view…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Speech and dialogue systems
MethodsAdapter · Focus
