MM-Path: Multi-modal, Multi-granularity Path Representation Learning --   Extended Version

Ronghui Xu; Hanyin Cheng; Chenjuan Guo; Hongfan Gao; Jilin Hu; Sean; Bin Yang; Bin Yang

arXiv:2411.18428·cs.LG·January 3, 2025

MM-Path: Multi-modal, Multi-granularity Path Representation Learning -- Extended Version

Ronghui Xu, Hanyin Cheng, Chenjuan Guo, Hongfan Gao, Jilin Hu, Sean, Bin Yang, Bin Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces MM-Path, a multi-modal, multi-granularity framework for path representation learning that integrates road network and image data to improve accuracy and generalization in transportation applications.

Contribution

It proposes a novel multi-modal, multi-granularity alignment and fusion strategy for path representation learning, addressing heterogeneity and semantic alignment challenges.

Findings

01

Enhanced path representation accuracy demonstrated on real-world datasets

02

Effective multi-modal data fusion improves downstream task performance

03

Framework outperforms existing single-modality models

Abstract

Developing effective path representations has become increasingly essential across various fields within intelligent transportation. Although pre-trained path representation learning models have shown improved performance, they predominantly focus on the topological structures from single modality data, i.e., road networks, overlooking the geometric and contextual features associated with path-related images, e.g., remote sensing images. Similar to human understanding, integrating information from multiple modalities can provide a more comprehensive view, enhancing both representation accuracy and generalization. However, variations in information granularity impede the semantic alignment of road network-based paths (road paths) and image-based paths (image paths), while the heterogeneity of multi-modal data poses substantial challenges for effective fusion and utilization. In this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

decisionintelligence/mm-path
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Multimodal Machine Learning Applications

MethodsFocus