MM-Path: Multi-modal, Multi-granularity Path Representation Learning -- Extended Version
Ronghui Xu, Hanyin Cheng, Chenjuan Guo, Hongfan Gao, Jilin Hu, Sean, Bin Yang, Bin Yang

TL;DR
This paper introduces MM-Path, a multi-modal, multi-granularity framework for path representation learning that integrates road network and image data to improve accuracy and generalization in transportation applications.
Contribution
It proposes a novel multi-modal, multi-granularity alignment and fusion strategy for path representation learning, addressing heterogeneity and semantic alignment challenges.
Findings
Enhanced path representation accuracy demonstrated on real-world datasets
Effective multi-modal data fusion improves downstream task performance
Framework outperforms existing single-modality models
Abstract
Developing effective path representations has become increasingly essential across various fields within intelligent transportation. Although pre-trained path representation learning models have shown improved performance, they predominantly focus on the topological structures from single modality data, i.e., road networks, overlooking the geometric and contextual features associated with path-related images, e.g., remote sensing images. Similar to human understanding, integrating information from multiple modalities can provide a more comprehensive view, enhancing both representation accuracy and generalization. However, variations in information granularity impede the semantic alignment of road network-based paths (road paths) and image-based paths (image paths), while the heterogeneity of multi-modal data poses substantial challenges for effective fusion and utilization. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Multimodal Machine Learning Applications
MethodsFocus
