SHED Light on Segmentation for Dense Prediction

Seung Hyun Lee; Sangwoo Mo; Stella X. Yu

arXiv:2601.22529·cs.CV·February 2, 2026

SHED Light on Segmentation for Dense Prediction

Seung Hyun Lee, Sangwoo Mo, Stella X. Yu

PDF

Open Access

TL;DR

SHED introduces a hierarchy-aware encoder-decoder architecture that explicitly incorporates segmentation to improve dense prediction tasks like depth estimation and 3D reconstruction, enhancing boundary sharpness, coherence, and interpretability.

Contribution

The paper presents SHED, a novel architecture that integrates hierarchical segmentation reasoning into dense prediction, improving structural consistency and cross-domain generalization without explicit segmentation supervision.

Findings

01

Improves depth boundary sharpness and segment coherence.

02

Enhances 3D reconstruction quality and interpretability.

03

Demonstrates strong cross-domain generalization from synthetic to real-world data.

Abstract

Dense prediction infers per-pixel values from a single image and is fundamental to 3D perception and robotics. Although real-world scenes exhibit strong structure, existing methods treat it as an independent pixel-wise prediction, often resulting in structural inconsistencies. We propose SHED, a novel encoder-decoder architecture that enforces geometric prior explicitly by incorporating segmentation into dense prediction. By bidirectional hierarchical reasoning, segment tokens are hierarchically pooled in the encoder and unpooled in the decoder to reverse the hierarchy. The model is supervised only at the final output, allowing the segment hierarchy to emerge without explicit segmentation supervision. SHED improves depth boundary sharpness and segment coherence, while demonstrating strong cross-domain generalization from synthetic to the real-world environments. Its hierarchy-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis