OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning
Kevin Valencia, Thilina Balasooriya, Xihaier Luo, Shinjae Yoo, David Keetae Park

TL;DR
OmniField is a novel neural framework that adaptively learns from sparse, noisy, and varying multimodal spatiotemporal data, enabling robust reconstruction, interpolation, and forecasting across modalities.
Contribution
It introduces a continuity-aware neural field conditioned on available modalities with a cross-modal fusion architecture for flexible, robust multimodal learning.
Findings
Outperforms eight strong baselines in multimodal tasks.
Maintains high performance under heavy sensor noise.
Enables unified reconstruction, interpolation, and forecasting.
Abstract
Multimodal spatiotemporal learning on real-world experimental data is constrained by two challenges: within-modality measurements are sparse, irregular, and noisy (QA/QC artifacts) but cross-modally correlated; the set of available modalities varies across space and time, shrinking the usable record unless models can adapt to arbitrary subsets at train and test time. We propose OmniField, a continuity-aware framework that learns a continuous neural field conditioned on available modalities and iteratively fuses cross-modal context. A multimodal crosstalk block architecture paired with iterative cross-modal refinement aligns signals prior to the decoder, enabling unified reconstruction, interpolation, forecasting, and cross-modal prediction without gridding or surrogate preprocessing. Extensive evaluations show that OmniField consistently outperforms eight strong multimodal…
Peer Reviews
Decision·ICLR 2026 Poster
* The paper is well-structured and clearly motivates the problem by identifying two central challenges — data sparsity and multimodal inconsistency — and proposing targeted solutions through the MCT, ICMR, and Fleximodal Fusion modules. The organization is logical. * The figures are well-designed and self-explanatory, effectively supporting the paper’s claims and illustrating the benefits of the proposed modules. Quantitative results are straightforward and convincing, showing consistent gains
* While the paper is generally well written, some parts are conceptually dense and abstract. The presentation could benefit from additional intuition, clearer intermediate explanations, or a small running example to illustrate how each proposed component (MCT, ICMR, Fleximodal Fusion) functions in practice. * The forecasting horizon studied in the current experiments is relatively short (e.g., six-hour prediction on ClimSim-THW). Evaluating longer temporal horizons could provide deeper insights
- The proposed fleximodal fusion and iterative refinement approach provides a principled mechanism for handling missing and noisy modalities, improving robustness in settings with irregular or sparse sensors. - The incorporation of frequency-rich embeddings and sinusoidal initialization yields measurable gains in high-frequency signal reconstruction, particularly in spatiotemporal domains. - The method showed consistent performance improvements across multiple scientific datasets, suggesting a g
- The evaluation focuses on a curated set of scientific benchmarks; broader assessment on diverse multimodal domains (robotics, remote sensing beyond climate/air quality) would strengthen claims of generality. - The computational and memory cost of iterative cross-modal refinement and continuous-field conditioning is not fully characterized. It's unclear how well the proposed method scales to higher-resolution or real-time applications. - While robustness to missing modalities is a central motiv
- The datasets and benchmarks are comprehensive spanning across multiple applications - Omnifield shows robustness and performance gains - Proposed components are validated through ablation studies
- I am quiet concerned with the novelty. The core framework remains a straightforward extension of SCENT with a few architectural augmentations for multimodal data. - The authors have limited explanation of training efficiency and scalability. The computational complexity can grow with the number of tokens and modalities, but the paper has limited analysis on the training or inference efficiency, nor does it discuss how OmniField might perform on larger-scale or real-time systems. Given that som
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Time Series Analysis and Forecasting · Traffic Prediction and Management Techniques
