Engineering Regression Without Real-Data Training: Domain Adaptation for Tabular Foundation Models Using Multi-Dataset Embeddings
Lyle Regenwetter, Rosen Yu, Cyril Picard, Faez Ahmed

TL;DR
This paper presents a domain adaptation method for engineering regression tasks using synthetic data curation, improving foundation model performance without real data training by identifying engineering-like synthetic datasets.
Contribution
It introduces TREDBench, a collection of real-world datasets, and a synthetic data curation approach that enhances foundation model transfer to engineering domains.
Findings
Synthetic data curation improves predictive accuracy.
Synthetic-only adaptation outperforms baseline models.
Data efficiency gains of 1.75x to 4.44x achieved.
Abstract
Predictive modeling in engineering applications has long been dominated by bespoke models and small, siloed tabular datasets, limiting the applicability of large-scale learning approaches. Despite recent progress in tabular foundation models, the resulting synthetic training distributions used for pre-training may not reflect the statistical structure of engineering data, limiting transfer to engineering regression. We introduce TREDBench, a curated collection of 83 real-world tabular regression datasets with expert engineering/non-engineering labels, and use TabPFN 2.5's dataset-level embedding to study domain structure in a common representation space. We find that engineering datasets are partially distinguishable from non-engineering datasets, while standard procedurally generated datasets are highly distinguishable from engineering datasets, revealing a substantial synthetic-real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning in Materials Science · Advanced Graph Neural Networks
