DOMR: Establishing Cross-View Segmentation via Dense Object Matching

Jitong Liao; Yulu Gao; Shaofei Huang; Jialin Gao; Jie Lei; Ronghua Liang; Si Liu

arXiv:2508.04050·cs.CV·August 7, 2025

DOMR: Establishing Cross-View Segmentation via Dense Object Matching

Jitong Liao, Yulu Gao, Shaofei Huang, Jialin Gao, Jie Lei, Ronghua Liang, Si Liu

PDF

TL;DR

DOMR introduces a dense object matching framework that effectively establishes cross-view object correspondences between egocentric and exocentric views by leveraging positional, semantic, and inter-object relationships.

Contribution

The paper presents the DOMR framework with a novel dense object matcher and mask refinement, achieving state-of-the-art results in cross-view segmentation tasks.

Findings

01

Achieves a mean IoU of 49.7% on Ego→Exo

02

Achieves a mean IoU of 55.2% on Exo→Ego

03

Outperforms previous methods by 4.3-5.8% in IoU

Abstract

Cross-view object correspondence involves matching objects between egocentric (first-person) and exocentric (third-person) views. It is a critical yet challenging task for visual understanding. In this work, we propose the Dense Object Matching and Refinement (DOMR) framework to establish dense object correspondences across views. The framework centers around the Dense Object Matcher (DOM) module, which jointly models multiple objects. Unlike methods that directly match individual object masks to image features, DOM leverages both positional and semantic relationships among objects to find correspondences. DOM integrates a proposal generation module with a dense matching module that jointly encodes visual, spatial, and semantic cues, explicitly constructing inter-object relationships to achieve dense matching among objects. Furthermore, we combine DOM with a mask refinement head…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.