GIT-BO: High-Dimensional Bayesian Optimization with Tabular Foundation Models
Rosen Ting-Ying Yu, Cyril Picard, Faez Ahmed

TL;DR
GIT-BO introduces a high-dimensional Bayesian optimization framework that leverages tabular foundation models and active subspace methods to improve efficiency and scalability without retraining.
Contribution
It combines a tabular foundation model with gradient-based subspace exploration for scalable, zero-shot Bayesian optimization in high dimensions.
Findings
Outperforms state-of-the-art GP-based BO methods in high-dimensional benchmarks.
Requires no online retraining, reducing computational overhead.
Effective across diverse real-world and synthetic problems up to 500 dimensions.
Abstract
Bayesian optimization (BO) struggles in high dimensions, where Gaussian-process surrogates demand heavy retraining and brittle assumptions, slowing progress on real engineering and design problems. We introduce GIT-BO, a Gradient-Informed BO framework that couples TabPFN v2, a tabular foundation model that performs zero-shot Bayesian inference in context, with an active-subspace mechanism computed from the model's own predictive-mean gradients. This aligns exploration to an intrinsic low-dimensional subspace via a Fisher-information estimate and selects queries with a UCB acquisition, requiring no online retraining. Across 60 problem variants spanning 20 benchmarks-nine scalable synthetic families and ten real-world tasks (e.g., power systems, Rover, MOPTA08, Mazda)-up to 500 dimensions, GIT-BO delivers a stronger performance-time trade-off than state-of-the-art GP-based methods…
Peer Reviews
Decision·ICLR 2026 Poster
++ This method combines frozen tabular foundation models (TFMs) with gradient-informed subspace discovery, and provides a novel fusion of amortized inference and classical dimension reduction. ++ Comprehensive benchmarking against SOTA methods, with rigorous statistical ranking and runtime analysis.
-- The performance depends on the pre-trained foundation model. The frozen TFM may not adapt well to functions outside its pre-training distribution, leading to poor performance on certain tasks. Additionally, no fine-tuning or domain adaptation is performed, which limits generalization to highly specialized or out-of-distribution objectives.
- The proposed method is faster (in terms of wall-clock time, while maintaining performance comparable to the existing state of art - The authors conduct plenty of interesting ablations, showing the importance of each of the components used in the final algorithm - The additional theory in the appendix, while not particularly novel and easily following from preceding work, is still a nice addition for completeness
- Since authors emphasise the importance of time-complexity, as opposed to pure sample complexity as it is typically done in BO literature, it would be nice to demonstrate a problem setting, where we actually care about time-complexity (e.g. high-throughput BO), as in most classical BO problems, sample complexity is paramount, whereas wallclock time is of secondary importance - It seems to be authors focused on a relative low-data regime, where fitting a GP is still relatively fast, it would be
The paper addresses a relevant problem. High-dimensional BO has received considerable attention in the past and is an active field of research. The paper is the first method that uses PFNs, which is an interesting surrogate model due to its in-context capabilities, for high-dimensional BO. The approach is well-motivated, and the paper is well-written. The storyline is clear, and the paper features an extensive empirical evaluation that shows the benefits of the approach. The evaluation is open a
The main concerns I have with this paper are the large performance degradations upon minor modifications of the algorithm. For instance, Figure 6 shows that the GIT-BO with expected improvement instead of the upper confidence bound performs considerably worse, worse than TabPFN without the gradient-informed subspace. Similarly, Figure 9 shows that the technique for sampling candidates in the low-dimensional subspace is crucial for performance. I wouldn’t expect such a big impact from these choic
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods · Machine Learning and Algorithms
