Less is More: Multimodal Region Representation via Pairwise Inter-view Learning

Min Namgung; Yijun Lin; JangHyeon Lee; Yao-Yi Chiang

arXiv:2505.18178·cs.LG·May 27, 2025

Less is More: Multimodal Region Representation via Pairwise Inter-view Learning

Min Namgung, Yijun Lin, JangHyeon Lee, Yao-Yi Chiang

PDF

Open Access

TL;DR

This paper introduces CooKIE, a novel multimodal region representation method that captures shared and unique information across multiple geospatial data modalities using pairwise inter-view learning, improving performance efficiently.

Contribution

It proposes CooKIE, an inter-view learning approach for multimodal region representation that effectively models high-order relationships without high complexity.

Findings

01

CooKIE outperforms existing RRL methods on multiple tasks.

02

It captures multimodal information with fewer parameters and FLOPs.

03

The approach is effective on datasets from New York City and Delhi.

Abstract

With the increasing availability of geospatial datasets, researchers have explored region representation learning (RRL) to analyze complex region characteristics. Recent RRL methods use contrastive learning (CL) to capture shared information between two modalities but often overlook task-relevant unique information specific to each modality. Such modality-specific details can explain region characteristics that shared information alone cannot capture. Bringing information factorization to RRL can address this by factorizing multimodal data into shared and unique information. However, existing factorization approaches focus on two modalities, whereas RRL can benefit from various geospatial data. Extending factorization beyond two modalities is non-trivial because modeling high-order relationships introduces a combinatorial number of learning objectives, increasing model complexity. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Image Retrieval and Classification Techniques

MethodsFocus · Contrastive Learning