Tuning without Peeking: Provable Generalization Bounds and Robust LLM Post-Training
Ismail Labiad, Mathurin Videau, Matthieu Kowalski, Marc Schoenauer, Alessandro Leite, Julia Kempe, Olivier Teytaud

TL;DR
This paper presents BBoxER, a black-box optimization method for LLM post-training that enhances privacy, robustness, and generalization, backed by theoretical guarantees and empirical validation on reasoning datasets.
Contribution
Introduces BBoxER, a novel black-box optimization approach for LLM post-training with provable generalization bounds and privacy guarantees, addressing security concerns of gradient-based methods.
Findings
Few iterations of BBoxER improve LLM performance.
BBoxER generalizes well on reasoning benchmarks.
Robust against membership inference and data poisoning attacks.
Abstract
Gradient-based optimization is the workhorse of deep learning, offering efficient and scalable training via backpropagation. However, exposing gradients during training can leak sensitive information about the underlying data, raising privacy and security concerns such as susceptibility to data poisoning attacks. In contrast, black box optimization methods, which treat the model as an opaque function, relying solely on function evaluations to guide optimization, offer a promising alternative in scenarios where data access is restricted, adversarial risks are high, or overfitting is a concern. This paper introduces BBoxER, an evolutionary black-box method for LLM post-training that induces an information bottleneck via implicit compression of the training data. Leveraging the tractability of information flow, we provide non-vacuous generalization bounds and strong theoretical guarantees…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques
