ToLL: Topological Layout Learning with Asymmetric Cross-View Structural Distillation for 3D Scene Graph Generation Pretraining
Yucheng Huang, Luping Ji, Xiangwei Jiang, Wen Li, Mao Ye

TL;DR
This paper introduces ToLL, a novel pretraining framework for 3D Scene Graph generation that addresses geometric shortcuts and enhances topological understanding through topological reasoning and multi-view augmentation.
Contribution
The paper proposes a topological layout learning method with anchor-conditioned reasoning and multi-view augmentation to improve 3DSG pretraining, overcoming geometric shortcut issues.
Findings
ToLL improves 3DSG quality over state-of-the-art methods.
The approach effectively mitigates geometric shortcut problems.
Experiments show significant performance gains on specialized datasets.
Abstract
3D Scene Graph (3DSG) generation plays a pivotal role in spatial understanding and affordance perception. To mitigate generalization issues from data scarcity, joint-embedding and generative proxy tasks are proposed to pre-train 3DSG representations on predicate label-free datasets. Currently, generative pre-training usually bypasses the semantic corruption caused by the geometric augmentations in joint-embedding, but cannot avoid a negative problem ``Geometric Shortcut." In this problem, exposing dense object spatial and scale priors will induce models to trivially reconstruct scenes by interpolating object positions, rather than learning the underlying topological constraints provided by edges. To address this issue, we propose a Topological Layout Learning (ToLL) for 3DSG generation pretraining framework. In detail, we design an Anchor-Conditioned Topological Geometry Reasoning. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
