On the flow matching interpretability
Francesco Pivi, Simone Gazza, Davide Evangelista, Roberto Amadini, Maurizio Gabbrielli

TL;DR
This paper introduces a physically constrained flow matching framework for generative models, making their intermediate steps interpretable by aligning them with known physical distributions, demonstrated using the 2D Ising model.
Contribution
It proposes a novel method to embed physical semantics into flow-based generative models, enhancing interpretability of the intermediate steps.
Findings
Preserves physical fidelity in generated configurations.
Outperforms Monte Carlo in generation speed for larger lattices.
Enables meaningful interpretation of flow steps as physical states.
Abstract
Generative models based on flow matching have demonstrated remarkable success in various domains, yet they suffer from a fundamental limitation: the lack of interpretability in their intermediate generation steps. In fact these models learn to transform noise into data through a series of vector field updates, however the meaning of each step remains opaque. We address this problem by proposing a general framework constraining each flow step to be sampled from a known physical distribution. Flow trajectories are mapped to (and constrained to traverse) the equilibrium states of the simulated physical process. We implement this approach through the 2D Ising model in such a way that flow steps become thermal equilibrium points along a parametric cooling schedule. Our proposed architecture includes an encoder that maps discrete Ising configurations into a continuous latent space, a…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
The paper is clearly written and the proposed approach is clearly presented. The Ising setup features ground true solution and has many well-studied properties, as an analytical form of probability density distribution and the normalization constant. Bridging generative modeling and statistical physics is an interesting research direction.
The motivation of the selected approach is not clear. The described numerical experiments and conclusions are not applicable to other pretrained flow matching models.
1. The paper is generally well written and easy to understand. 2. The experiments demonstrate that the flow approach amortizes the cost of drawing equilibrium samples, since expensive sampling is performed only once during training.
Early in the paper, the authors motivate their work by pointing out that the intermediate steps of a flow matching model are not interpretable. It is also claimed around line 77 that 'This interpretability extends beyond the specific physical system, providing a general methodology for constraining generative flows to meaningful intermediate representations.' But it appears to me that the overall structure of your pipeline is similar to many examples of NN-amortized MCMC that are already present
The paper is well written. The preliminaries are explained very well, and the write-up on the Ising model is very good.
I believe that the approach is not generalisable to target distributions without a scalar parameter; e.g., to the standard problem of image generation, or to other lattice systems that have more than one scalar parameter. The paper begins with the aim of making flow matching interpretable; however, the results do not discuss interpretability. They discuss sampling speed and sample statistics. I am unsure whether the proposed method contributes to interpretability itself. In the experiments, th
1. **Novel perspective on interpretability**: The paper addresses a genuine gap in generative modeling—intermediate steps in diffusion/flow models lack clear semantic meaning. Constraining trajectories to physical equilibrium states is a creative solution. 2. **Clear presentation**: The encoder-latent field-projector pipeline is well-described and easy to follow. The piecewise linear interpolation strategy is sensible.
### Major Issues 1. **Extremely limited scope**: Evidence is restricted to a single, well-studied toy system (2D Ising with macroscopic metrics). No other physical systems, no real data, no standard ML benchmarks. This severely limits the impacts of the results, the paper claims a "general framework" but demonstrates only a single instantiation. 2. **Interpretability is not rigorously measured**: The paper defines interpretability by construction (alignment with β) but provides
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Materials Science · Quantum many-body systems
