Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training

Ismail Labiad; Mathurin Videau; Matthieu Kowalski; Marc Schoenauer; Alessandro Leite; Julia Kempe; Olivier Teytaud

arXiv:2507.01752·cs.LG·January 6, 2026

Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training

Ismail Labiad, Mathurin Videau, Matthieu Kowalski, Marc Schoenauer, Alessandro Leite, Julia Kempe, Olivier Teytaud

PDF

Open Access

TL;DR

This paper presents BBoxER, a black-box optimization method for LLM post-training that enhances privacy, robustness, and generalization, backed by theoretical guarantees and empirical validation on reasoning datasets.

Contribution

Introduces BBoxER, a novel black-box optimization approach for LLM post-training with provable generalization bounds and privacy guarantees, addressing security concerns of gradient-based methods.

Findings

01

Few iterations of BBoxER improve LLM performance.

02

BBoxER generalizes well on reasoning benchmarks.

03

Robust against membership inference and data poisoning attacks.

Abstract

Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, exposing gradients during training can leak sensitive information about the underlying data, raising privacy and security concerns such as susceptibility to data poisoning attacks. In contrast, black box optimization methods, which treat the model as an opaque function, relying solely on function evaluations to guide optimization, offer a promising alternative in scenarios where data access is restricted, adversarial risks are high, or overfitting is a concern. This paper introduces BBoxER, an evolutionary black-box method for LLM post-training that induces an information bottleneck via implicit compression of the training data. Leveraging the tractability of information flow, we provide non-vacuous generalization bounds and strong theoretical guarantees…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques