LD-EnSF: Synergizing Latent Dynamics with Ensemble Score Filters for Fast Data Assimilation with Sparse Observations
Pengpeng Xiao, Phillip Si, Peng Chen

TL;DR
LD-EnSF introduces a fast, efficient data assimilation method that operates in a learned latent space, significantly reducing computational costs while accurately handling sparse, noisy observations in high-dimensional systems.
Contribution
The paper presents LD-EnSF, a novel latent space score-based data assimilation approach that eliminates full-space simulations and improves efficiency with surrogate dynamics and history-aware encoding.
Findings
Achieves orders-of-magnitude speedup over existing methods
Maintains high accuracy and robustness with sparse, noisy data
Effective on high-dimensional benchmarks
Abstract
Data assimilation techniques are crucial for accurately tracking complex dynamical systems by integrating observational data with numerical forecasts. Recently, score-based data assimilation methods emerged as powerful tools for high-dimensional and nonlinear data assimilation. However, these methods still incur substantial computational costs due to the need for expensive forward simulations. In this work, we propose LD-EnSF, a novel score-based data assimilation method that eliminates the need for full-space simulations by evolving dynamics directly in a compact latent space. Our method incorporates improved Latent Dynamics Networks (LDNets) to learn accurate surrogate dynamics and introduces a history-aware LSTM encoder to effectively process sparse and irregular observations. By operating entirely in the latent space, LD-EnSF achieves speedups orders of magnitude over existing…
Peer Reviews
Decision·ICLR 2026 Poster
Clear Motivation and Relevance - The paper tackles a key limitation of recent score-based filters, their high computational cost and poor performance with sparse observations. Solid Technical Design - The integration of latent surrogate dynamics (LDNet) and score-based Bayesian filtering (EnSF) is smart. - The introduction of a history-aware LSTM observation encoder effectively extends the latent assimilation framework to handle irregular and sparse data. Comprehensive Experiments - The autho
Limited Theoretical Novelty - The proposed method primarily combines existing techniques (LDNet, EnSF, LSTM encoding). While the combination is well-executed and impactful the theoretical advancement is modest. The novelty lies more in the integration and empirical rigor. Benchmark Coverage and Positioning - Comparisons are limited to EnSF, Latent-EnSF, and LETKF. While these are strong and relevant baselines, the paper could benefit from a clearer discussion of recent efficient variational and
Decoder-free assimilation loop: All filtering happens in latent space (state + params), avoiding per-step decoding and cutting both compute and error accumulation. Efficiency at scale: Small latent dimension + ensemble updates → orders-of-magnitude cheaper than full-state DA; design is hardware-friendly and parallelizable. Robustness features: Reverse-SDE damping and simple latent noise modeling make the update stable under severe sparsity/noise. Clear training recipe: Two-stage LDNet trainin
Overall the paper presents a clear contribution with thorough experimentation. Following are my major concerns : Related Work Missing : The paper under-cites several very relevant 2024–2025 works in that would strengthen positioning: Neural Operators for DA and Semilinear PDEs : Fourier Neural Operator and SFNO have presented great result in PDE/wether modeling but no discussion have been provided in regard to them. Additionally Semilinear Neural Operator (ICLR 2024) that proposes a recurs
Significant Practical Advance: The paper addresses a critical bottleneck in data assimilation – the computational cost under high-dimensional, sparse observation settings. By eliminating full-state simulations during filtering, LD-EnSF achieves massive speedups (e.g. 200,000× in one case), enabling applications (real-time forecasting, larger ensembles) that were previously infeasible. This practical improvement is highly valuable for the community. Robust Accuracy under Extreme Conditions: Empi
Dependence on Offline Training and Generalization Limits: A potential concern is the heavy reliance on training a surrogate model (LDNet) on simulation data before deployment. Acquiring a comprehensive training dataset covering all relevant system behaviors and parameter ranges can be costly, and the method’s performance may degrade if the true system behavior deviates from the training distribution. The paper’s approach is essentially as good as its learned model – for scenarios with significa
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMeteorological Phenomena and Simulations · Hydrological Forecasting Using AI · Hydrology and Watershed Management Studies
