Probing the Latent Hierarchical Structure of Data via Diffusion Models
Antonio Sclocchi, Alessandro Favero, Noam Itzhak Levi, Matthieu Wyart

TL;DR
This paper demonstrates that diffusion models can be used to probe and measure the hierarchical latent structure of data, revealing how data changes relate to underlying latent variables.
Contribution
It introduces a novel method using forward-backward diffusion experiments to analyze the latent hierarchical structure in high-dimensional data.
Findings
Latent variable changes manifest as correlated chunks in data.
A phase transition in the noise level indicates a change in data structure.
Validated approach on text and image datasets with state-of-the-art diffusion models.
Abstract
High-dimensional data must be highly structured to be learnable. Although the compositional and hierarchical nature of data is often put forward to explain learnability, quantitative measurements establishing these properties are scarce. Likewise, accessing the latent variables underlying such a data structure remains a challenge. In this work, we show that forward-backward experiments in diffusion-based models, where data is noised and then denoised to generate new samples, are a promising tool to probe the latent structure of data. We predict in simple hierarchical models that, in this process, changes in data occur by correlated chunks, with a length scale that diverges at a noise level where a phase transition is known to take place. Remarkably, we confirm this prediction in both text and image datasets using state-of-the-art diffusion models. Our results show how latent variable…
Peer Reviews
Decision·ICLR 2025 Poster
(1) The paper introduces novel approaches for analyzing the structure of inputs using pretrained diffusion and language models. (2) The authors offer a thorough analysis and derivation, with experimental results closely aligning with theoretical expectations. (3) Multiple schematic diagrams and data visualizations are included, providing valuable insights into the methods.
(1) The paper’s presentation could be improved. While there are numerous figures to aid understanding, the main text is somewhat challenging to follow. (2) Why is the σ in Equation 3 binary? Wouldn’t a continuous measurement be more appropriate? For instance, a small difference in pixel values might not alter the semantic structure of the images, but it would be captured by binary measurement. (3) Shouldn’t the spatial correlation structures be content-dependent? For example, if the bird and t
1. The hierarchical perspective provides novel insights into the diffusion model's mechanism and the application of physics is also refreshing. I feel that the community can benefit from these insights, which may give rise to empirical advancements. 2. The paper is well written and clearly communicates the main ideas. 3. The experiments on natural data (image/text) support the theoretical claims.
1. It'd be great to see attempts at utilizing the theoretical/empirical observations to advance practical model design. Some discussions along this direction would also be appreciated. 2. The tree model seems overly simplified for real-world data like images and languages. For example, one would imagine two high-level variables could become co-parents for some low-level variables, thus breaking the tree structure. I would appreciate a discussion on this limitation and the applicability of the th
1. The authors aim to capture hidden hierarchical structures within discrete data using the RHM model, with their RHM+BP framework supporting both discrete and continuous diffusion processes. 2. By applying BP for denoising, the authors rigorously analyze phase transitions in the denoising results and identify the critical noise level needed to induce a change in the data class (or low-level feature).
1. The paper is somewhat disorganized and hard to follow, as definitions, derivations, and experimental results are heavily interwoven. To improve clarity, consider using theorems or structured definitions to better organize the content (e.g. by moving some derivations, such as Equations 8 and 9 to appendix and summarizing them as a main theorem). 2. In practice, people use real data + score-based denoising; however, the authors use RHM data + BP denoising instead. This discrepancy is insuffici
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Semantic Web and Ontologies
MethodsDiffusion
