ToLL: Topological Layout Learning with Asymmetric Cross-View Structural Distillation for 3D Scene Graph Generation Pretraining

Yucheng Huang; Luping Ji; Xiangwei Jiang; Wen Li; Mao Ye

arXiv:2603.28178·cs.CV·April 21, 2026

ToLL: Topological Layout Learning with Asymmetric Cross-View Structural Distillation for 3D Scene Graph Generation Pretraining

Yucheng Huang, Luping Ji, Xiangwei Jiang, Wen Li, Mao Ye

PDF

TL;DR

This paper introduces ToLL, a novel pretraining framework for 3D Scene Graph generation that addresses geometric shortcuts and enhances topological understanding through topological reasoning and multi-view augmentation.

Contribution

The paper proposes a topological layout learning method with anchor-conditioned reasoning and multi-view augmentation to improve 3DSG pretraining, overcoming geometric shortcut issues.

Findings

01

ToLL improves 3DSG quality over state-of-the-art methods.

02

The approach effectively mitigates geometric shortcut problems.

03

Experiments show significant performance gains on specialized datasets.

Abstract

3D Scene Graph (3DSG) generation plays a pivotal role in spatial understanding and affordance perception. To mitigate generalization issues from data scarcity, joint-embedding and generative proxy tasks are proposed to pre-train 3DSG representations on predicate label-free datasets. Currently, generative pre-training usually bypasses the semantic corruption caused by the geometric augmentations in joint-embedding, but cannot avoid a negative problem ``Geometric Shortcut." In this problem, exposing dense object spatial and scale priors will induce models to trivially reconstruct scenes by interpolating object positions, rather than learning the underlying topological constraints provided by edges. To address this issue, we propose a Topological Layout Learning (ToLL) for 3DSG generation pretraining framework. In detail, we design an Anchor-Conditioned Topological Geometry Reasoning. It…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.