GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei, Hang Wang, Bingbing Ni

TL;DR
This paper introduces the use of geodesic distance as a novel metric in multi-modal learning, effectively capturing complex sample relationships and enhancing model performance in nonlinear spaces.
Contribution
It pioneers the application of geodesic distance in multi-modal learning, proposing a graph-based approach with hierarchical clustering for efficient computation.
Findings
Improved accuracy in downstream tasks
Effective modeling of complex sample relationships
Enhanced discrimination between similar but semantically different samples
Abstract
Geodesic distance serves as a reliable means of measuring distance in nonlinear spaces, and such nonlinear manifolds are prevalent in the current multimodal learning. In these scenarios, some samples may exhibit high similarity, yet they convey different semantics, making traditional distance metrics inadequate for distinguishing between positive and negative samples. This paper introduces geodesic distance as a novel distance metric in multi-modal learning for the first time, to mine correlations between samples, aiming to address the limitations of common distance metric. Our approach incorporates a comprehensive series of strategies to adapt geodesic distance for the current multimodal learning. Specifically, we construct a graph structure to represent the adjacency relationships among samples by thresholding distances between them and then apply the shortest-path algorithm to obtain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning
