3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding

Xiaoye Wang; Chen Tang; Xiangyu Yue; Wei-Hong Li

arXiv:2511.20646·cs.CV·November 26, 2025

3D-Aware Multi-Task Learning with Cross-View Correlations for Dense Scene Understanding

Xiaoye Wang, Chen Tang, Xiangyu Yue, Wei-Hong Li

PDF

Open Access

TL;DR

This paper introduces a 3D-aware multi-task learning framework that incorporates cross-view correlations via a lightweight module, enhancing dense scene understanding tasks like segmentation and depth estimation.

Contribution

It proposes a novel Cross-view Module (CvM) that captures 3D geometric consistency across views, improving multi-task learning performance.

Findings

01

Improved accuracy on NYUv2 and PASCAL-Context datasets.

02

Effective integration of geometric consistency into existing MTL methods.

03

Applicable to both single and multi-view data.

Abstract

This paper addresses the challenge of training a single network to jointly perform multiple dense prediction tasks, such as segmentation and depth estimation, i.e., multi-task learning (MTL). Current approaches mainly capture cross-task relations in the 2D image space, often leading to unstructured features lacking 3D-awareness. We argue that 3D-awareness is vital for modeling cross-task correlations essential for comprehensive scene understanding. We propose to address this problem by integrating correlations across views, i.e., cost volume, as geometric consistency in the MTL network. Specifically, we introduce a lightweight Cross-view Module (CvM), shared across tasks, to exchange information across views and capture cross-view correlations, integrated with a feature from MTL encoder for multi-task predictions. This module is architecture-agnostic and can be applied to both single…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning