Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion
Aditya Shankar, Yuandou Wang, Rihan Hai, Lydia Y. Chen

TL;DR
Harpoon introduces a novel manifold-guided diffusion approach for conditional tabular data generation, enabling flexible and accurate control over diverse constraints during inference, surpassing existing methods' limitations.
Contribution
The paper extends manifold theory to tabular data and develops HARPOON, a diffusion model that effectively guides samples along the manifold to satisfy various conditions at inference.
Findings
HARPOON outperforms existing methods in imputation tasks.
It effectively enforces inequality constraints on tabular data.
The approach demonstrates strong generalization across multiple datasets.
Abstract
Generating tabular data under conditions is critical to applications requiring precise control over the generative process. Existing methods rely on training-time strategies that do not generalise to unseen constraints during inference, and struggle to handle conditional tasks beyond tabular imputation. While manifold theory offers a principled way to guide generation, current formulations are tied to specific inference-time objectives and are limited to continuous domains. We extend manifold theory to tabular data and expand its scope to handle diverse inference-time objectives. On this foundation, we introduce HARPOON, a tabular diffusion method that guides unconstrained samples along the manifold geometry to satisfy diverse tabular conditions at inference. We validate our theoretical contributions empirically on tasks such as imputation and enforcing inequality constraints,…
Peer Reviews
Decision·ICLR 2026 Poster
The paper is very well written and provides a lot of intuitions about the results they propose. I really appreciated the visualisations of the gradients and the updates. The experimental analysis is extensive and with very positive results. The authors give a very nice geometric explanation of why their method works. **Note:** It is difficult for me to assess the novelty of this work wrt the previous works on diffusion models as I am not familiar with them.
1. The authors do not report the sampling generation time. As this is an important metric for tabular data generation, it would be nice to have it 2. In theorem 3.2 $\mathcal{C}$ is not defined. Also it is not clear which conditions we have on $\mathcal{C}$. Can it really be any arbitrary information? For example (Stoian & Giunchiglia, 2025) has extended the work cited in your paper to constraints expressed as disjunctions over linear inequalities. This defines non-convex and disconnected spac
1. The paper is well written and provides both theoretical and experimental contributions 2. The theory is novel and is the first to use diffusion model's orthogonal projection to manifolds in tabular setting 3. HARPOON can handle mixed data types and has much lower constraint violation rate when doing conditional generation
1. Computation time might be an issue but this is not discussed in the experiments 2. The utility, fidelity and privacy aspects of the generated tabular data is not discussed. It would be great to see where HARPOON stands among these 3 aspects.
- The authors generalize previous results, extending the usefulness of manifold-based insights to tabular data. In particular, they remove the necessity of squared losses, which extends the effective conditioning capabilities. - The ability to condition on certain features or constraints at inference time without the need for training the model on that specific conditional generation task is very valuable, in particular for tabular data. - The illustrations mostly paint an intuitive picture of t
- The methodological background on diffusion is basically non-existent and the overview of the tabular diffusion models is severely underdeveloped. Since TabDDPM, many models have been proposed that 1) often considerably outperform TabDDPM and 2) do not rely on multinomial diffusion. In fact, models that treat categorical data in some continuous space exist (e.g., TabSyn [1] or CDTD [2]). It is unclear how the results extend to such models. - The missingness rates of 0.5 and 0.75 in the main res
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
