TL;DR
PFN-TS introduces a novel Thompson sampling method that leverages prior-data fitted networks to efficiently approximate Bayesian posteriors in contextual bandits, improving empirical performance and theoretical guarantees.
Contribution
It develops PFN-TS, a new Thompson sampling algorithm that converts PFN posterior predictives into mean-reward samples using a subsampled variance estimator, with proven consistency and regret bounds.
Findings
PFN-TS achieves top average rank on synthetic and OpenML benchmarks.
It remains competitive on linear and BART-generated rewards.
PFN-TS attains highest estimated policy value in offline mobile-health evaluation.
Abstract
Thompson sampling is a widely used strategy for contextual bandits: at each round, it samples a reward function from a Bayesian posterior and acts greedily under that sample. Prior-data fitted networks (PFNs), such as TabPFN v2+ and TabICL v2, are attractive candidates for this purpose because they approximate Bayesian posterior predictive distributions in a single forward pass. However, PFNs predict noisy future rewards, while Thompson sampling requires uncertainty over the latent mean reward function. We propose PFN-TS, a Thompson sampling algorithm that converts PFN posterior predictives into mean-reward samples using a subsampled predictive central limit theorem. The method estimates posterior variance from a geometric grid of dataset prefixes rather than the full predictive sequence used in previous predictive-sequence approaches, and reuses TabICL's cached…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
