Weakly supervised cross-domain alignment with optimal transport

Siyang Yuan; Ke Bai; Liqun Chen; Yizhe Zhang; Chenyang Tao; Chunyuan; Li; Guoyin Wang; Ricardo Henao; Lawrence Carin

arXiv:2008.06597·cs.CV·August 18, 2020·5 cites

Weakly supervised cross-domain alignment with optimal transport

Siyang Yuan, Ke Bai, Liqun Chen, Yizhe Zhang, Chenyang Tao, Chunyuan, Li, Guoyin Wang, Ricardo Henao, Lawrence Carin

PDF

Open Access

TL;DR

This paper introduces a weakly-supervised method leveraging optimal transport to improve fine-grained cross-domain alignment between images and text, enhancing performance with simpler models.

Contribution

It proposes a novel optimal transport-based regularizer for cross-domain alignment that is efficient and compatible with existing models, advancing weakly-supervised vision-language tasks.

Findings

01

Outperforms state-of-the-art methods on vision-language benchmarks

02

Enables simpler models to achieve competitive results

03

Demonstrates efficiency and effectiveness of OT regularization

Abstract

Cross-domain alignment between image objects and text sequences is key to many visual-language tasks, and it poses a fundamental challenge to both computer vision and natural language processing. This paper investigates a novel approach for the identification and optimization of fine-grained semantic similarities between image and text entities, under a weakly-supervised setup, improving performance over state-of-the-art solutions. Our method builds upon recent advances in optimal transport (OT) to resolve the cross-domain matching problem in a principled manner. Formulated as a drop-in regularizer, the proposed OT solution can be efficiently computed and used in combination with other existing approaches. We present empirical evidence to demonstrate the effectiveness of our approach, showing how it enables simpler model architectures to outperform or be comparable with more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques