Graph Optimal Transport for Cross-Domain Alignment
Liqun Chen, Zhe Gan, Yu Cheng, Linjie Li, Lawrence Carin, Jingjing Liu

TL;DR
This paper introduces Graph Optimal Transport (GOT), a novel framework that formulates cross-domain alignment as a graph matching problem using optimal transport, improving interpretability and performance across multiple tasks.
Contribution
GOT uniquely combines Wasserstein and Gromov-Wasserstein distances for flexible, interpretable cross-domain alignment within neural networks, advancing beyond existing attention-based methods.
Findings
GOT outperforms baselines in image-text retrieval and VQA.
GOT improves results in machine translation and text summarization.
The inferred transport plan offers sparse, interpretable alignments.
Abstract
Cross-domain alignment between two sets of entities (e.g., objects in an image, words in a sentence) is fundamental to both computer vision and natural language processing. Existing methods mainly focus on designing advanced attention mechanisms to simulate soft alignment, with no training signals to explicitly encourage alignment. The learned attention matrices are also dense and lacks interpretability. We propose Graph Optimal Transport (GOT), a principled framework that germinates from recent advances in Optimal Transport (OT). In GOT, cross-domain alignment is formulated as a graph matching problem, by representing entities into a dynamically-constructed graph. Two types of OT distances are considered: (i) Wasserstein distance (WD) for node (entity) matching; and (ii) Gromov-Wasserstein distance (GWD) for edge (structure) matching. Both WD and GWD can be incorporated into existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Multimodal Machine Learning Applications
MethodsInterpretability
