TL;DR
This paper investigates a failure mode called Meltdown in 3D diffusion transformers conditioned on point clouds, revealing how small perturbations can cause catastrophic reconstruction failures and proposing PowerRemap as a mitigation.
Contribution
The study identifies the mechanism behind Meltdown failures, linking circuit-level attention issues to trajectory bifurcations, and introduces PowerRemap to effectively mitigate these failures.
Findings
Meltdown occurs in 89.9-100% of tested shapes across architectures.
PowerRemap rescues 98.3% of shapes on WaLa and 84.6% on Make-a-Shape.
Failure is linked to low-rank, directional perturbations in the diffusion process.
Abstract
Sparse point clouds are a common input modality for 3D surface reconstruction, including in safety-critical settings such as surgical navigation and autonomous perception. Recent point-cloud-conditioned 3D diffusion transformers achieve state-of-the-art results in this regime by leveraging learned priors. We show that these models can fail catastrophically under realistic input variation, and present a mechanistic case study of why. We identify a failure mode we call Meltdown: tiny on-surface perturbations to a sparse input point cloud can fracture the reconstructed output into hundreds of disconnected pieces. Adversarial search recovers Meltdown in 89.9-100% of shapes across the two open-weight state-of-the-art architectures we study (WaLa, Make-a-Shape) on real-world datasets (GSO, SimJEB) and under both DDPM and DDIM sampling. We trace Meltdown along the forward pass: it is governed…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper presents an interesting application of existing activation patching method to identify geometry-related representations within 3D latent diffusion models. 2. The proposed meltdown phenomenon is novel and well-characterized, although it remains unclear whether similar behavior would be observed on other surface reconstruction datasets beyond Google Scanned Objects (GSO). 3. The proposed PowerRemap intervention is simple yet effective, demonstrating strong recovery performance on WaLa
1. The generalizability of this finding is very limited. The experiment focused on two models (WaLa and MAKE-A-SHAPE) and evaluated the meltdown on only one dataset (Google Scanned Objects). It is unknown whether the meltdown phenonomon is unique to the GSO datasets, and if the cross-attention head that controls the meltdown can be found in latent 3D diffusion transformer, other than WaLa and MAKE-A-SHAPE. 2. As shown in Tables 2 and 3 in Appendix B.3 (p. 21), the effectiveness of PowerRemap di
Clean activation-patching grid over depth×time pinpoints a single early cross-attention write controlling meltdown; procedure and repair map are explicit. PowerRemap is model-agnostic, test-time only, and provably reduces spectral entropy without changing singular vectors. On GSO, meltdown occurs widely, and PowerRemap rescues 98.3% of WALA failures.
For make-a-shape, reported rescue is only 10.1% with the same 𝛾, suggesting sensitivity to architecture and hyperparameters and limiting generality. Spectral entropy is the only diagnostic evaluated; no comparison to effective rank, top-k energy, condition number, per-head concentration, or Jacobian norms. “Connected components” may conflate legitimate multi-part objects with failures; precision/recall vs. human labels not reported. 𝛾 selection is ad-hoc (global 𝛾=100); the paper itself note
1. The meltdown phenomenon is a common issue in 3D diffusion models for shape completion and worth investigating. 2. The finding that a single cross-attention module is primarily responsible for the observed failure is particularly interesting and provides useful insight into the model’s internal behavior. 3. The discussion on diffusion dynamics is interesting and contributes to a better conceptual understanding of diffusion behavior.
1. The experiments are insufficient. The observed meltdown failure is likely to depend strongly on the density of the input point cloud, yet this factor is neither analyzed nor explicitly specified in the experiments. In addition, all experiments are conducted solely on the GSO dataset, which limits the generality of the conclusions. Including results on at least one additional dataset would significantly strengthen the empirical support for the proposed theory. 2. In Fig. 3, the trend of conne
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · 3D Shape Modeling and Analysis
