OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning

Kevin Valencia; Thilina Balasooriya; Xihaier Luo; Shinjae Yoo; David Keetae Park

arXiv:2511.02205·cs.LG·November 5, 2025

OmniField: Conditioned Neural Fields for Robust Multimodal Spatiotemporal Learning

Kevin Valencia, Thilina Balasooriya, Xihaier Luo, Shinjae Yoo, David Keetae Park

PDF

Open Access 3 Reviews

TL;DR

OmniField is a novel neural framework that adaptively learns from sparse, noisy, and varying multimodal spatiotemporal data, enabling robust reconstruction, interpolation, and forecasting across modalities.

Contribution

It introduces a continuity-aware neural field conditioned on available modalities with a cross-modal fusion architecture for flexible, robust multimodal learning.

Findings

01

Outperforms eight strong baselines in multimodal tasks.

02

Maintains high performance under heavy sensor noise.

03

Enables unified reconstruction, interpolation, and forecasting.

Abstract

Multimodal spatiotemporal learning on real-world experimental data is constrained by two challenges: within-modality measurements are sparse, irregular, and noisy (QA/QC artifacts) but cross-modally correlated; the set of available modalities varies across space and time, shrinking the usable record unless models can adapt to arbitrary subsets at train and test time. We propose OmniField, a continuity-aware framework that learns a continuous neural field conditioned on available modalities and iteratively fuses cross-modal context. A multimodal crosstalk block architecture paired with iterative cross-modal refinement aligns signals prior to the decoder, enabling unified reconstruction, interpolation, forecasting, and cross-modal prediction without gridding or surrogate preprocessing. Extensive evaluations show that OmniField consistently outperforms eight strong multimodal…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 2

Strengths

* The paper is well-structured and clearly motivates the problem by identifying two central challenges — data sparsity and multimodal inconsistency — and proposing targeted solutions through the MCT, ICMR, and Fleximodal Fusion modules. The organization is logical. * The figures are well-designed and self-explanatory, effectively supporting the paper’s claims and illustrating the benefits of the proposed modules. Quantitative results are straightforward and convincing, showing consistent gains

Weaknesses

* While the paper is generally well written, some parts are conceptually dense and abstract. The presentation could benefit from additional intuition, clearer intermediate explanations, or a small running example to illustrate how each proposed component (MCT, ICMR, Fleximodal Fusion) functions in practice. * The forecasting horizon studied in the current experiments is relatively short (e.g., six-hour prediction on ClimSim-THW). Evaluating longer temporal horizons could provide deeper insights

Reviewer 02Rating 6Confidence 3

Strengths

- The proposed fleximodal fusion and iterative refinement approach provides a principled mechanism for handling missing and noisy modalities, improving robustness in settings with irregular or sparse sensors. - The incorporation of frequency-rich embeddings and sinusoidal initialization yields measurable gains in high-frequency signal reconstruction, particularly in spatiotemporal domains. - The method showed consistent performance improvements across multiple scientific datasets, suggesting a g

Weaknesses

- The evaluation focuses on a curated set of scientific benchmarks; broader assessment on diverse multimodal domains (robotics, remote sensing beyond climate/air quality) would strengthen claims of generality. - The computational and memory cost of iterative cross-modal refinement and continuous-field conditioning is not fully characterized. It's unclear how well the proposed method scales to higher-resolution or real-time applications. - While robustness to missing modalities is a central motiv

Reviewer 03Rating 4Confidence 3

Strengths

- The datasets and benchmarks are comprehensive spanning across multiple applications - Omnifield shows robustness and performance gains - Proposed components are validated through ablation studies

Weaknesses

- I am quiet concerned with the novelty. The core framework remains a straightforward extension of SCENT with a few architectural augmentations for multimodal data. - The authors have limited explanation of training efficiency and scalability. The computational complexity can grow with the number of tokens and modalities, but the paper has limited analysis on the training or inference efficiency, nor does it discuss how OmniField might perform on larger-scale or real-time systems. Given that som

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Time Series Analysis and Forecasting · Traffic Prediction and Management Techniques