SIMSHIFT: A Benchmark for Adapting Neural Surrogates to Distribution Shifts
Paul Setinek, Gianluca Galletti, Thomas Gross, Dominik Schn\"urer, Johannes Brandstetter, Werner Zellinger

TL;DR
This paper introduces SIMSHIFT, a new benchmark dataset for evaluating neural surrogate models under distribution shifts in industrial simulations, and systematically assesses the effectiveness of Unsupervised Domain Adaptation techniques in this context.
Contribution
The paper presents SIMSHIFT, a comprehensive benchmark for distribution shift in neural surrogates, and extends UDA methods to improve their robustness in complex engineering simulations.
Findings
UDA methods show potential but face challenges under distribution shifts.
Neural surrogates often degrade significantly outside training distributions.
Open problems remain in achieving robust surrogates for industrial applications.
Abstract
Neural surrogates for Partial Differential Equations (PDEs) often suffer significant performance degradation when evaluated on problem configurations outside their training distribution, such as new initial conditions or structural dimensions. While Unsupervised Domain Adaptation (UDA) techniques have been widely used in vision and language to generalize across domains without additional labeled data, their application to complex engineering simulations remains largely unexplored. In this work, we address this gap through two focused contributions. First, we introduce SIMSHIFT, a novel benchmark dataset and evaluation suite composed of four industrial simulation tasks spanning diverse processes and physics: hot rolling, sheet metal forming, electric motor design and heatsink design. Second, we extend established UDA methods to state-of-the-art neural surrogates and systematically…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The investigated context of unsupervised domain adaptation (UDA) is a critical requirement in the scientific machine learning context because, for a neural surrogate to be useful, it needs to be able to "extrapolate" to unseen parameter configurations. This important problem has (unfortunately) not been investigated in depth as yet by the scientific machine learning community. Hence, this paper is a good start towards proposing a benchmark dataset as well as quantitative evaluations of state-
1. The paper requires better organization as there are multiple critical sections where the appendix is referred to, interrupting the flow of the paper and preventing the full understanding of the problem. One such point is the lack of clear demarcation of source and target domains for the four datasets in the main paper (this is done in the appendix in Appendices F1 - F4 where actual parameter ranges are discussed and Appendix C:Table 11 where actual source and target distribution splits are pr
1. All architectures share the same conditioning network (sin–cos + shallow MLP, 8-dim latent), making the parametric setup consistent across models. 2. PointNet (global pooling), GraphSAGE with FiLM (local message passing), Transolver with DiT (attention with learned slicing), and UPT (latent field modeling for very large meshes) are adapted to conditional mesh regression and chosen to match scale constraints.
1. The work extends established DA methods to neural surrogates and benchmarks them, it does not introduce a new neural operator or a new UDA/model-selection algorithm (conditioning, FiLM/DiT choices are standardized, not novel). 2. While the paper reviews neural operator literature, the baseline lineup focuses on PointNet/ GNN/ Transformer/ UPT variants; adding FNO/Geo-FNO/GKN comparisons would better situate results, especially in 3D-large-mesh regimes. 3. The datasets are steady-state only
- The paper provides code and data for industry relevant applications. - Multiple models, UDA methods, and unsupervised model selection methods are tested. - The supplied data and experimental details are thoroughly explained and provide a nice test bed for evaluating steady state problems with UDA for other researchers.
- It would be interesting to see the author's hypotheses on why certain UDA mehods and unsupervised model selection methods worked well in specific datasets. - Steady state only problems limit the scope of this benchmark, adding time dependent problems would also be useful.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
