Tail-Aware Information-Theoretic Generalization for RLHF and SGLD
Huiming Zhang, Binghan Li, Wan Tian, Qiang Sun

TL;DR
This paper introduces a tail-aware information-theoretic framework for analyzing generalization in settings with heavy-tailed data, extending classical bounds to sub-Weibull distributions.
Contribution
It develops a novel tail-dependent divergence and establishes sharp inequalities for sub-Weibull processes, enabling new generalization bounds in heavy-tailed scenarios.
Findings
Provides PAC-Bayes bounds for heavy-tailed data
Derives a chaining inequality based on Rényi mutual information
Applies framework to RLHF and SGLD with heavy-tailed noise
Abstract
Classical information-theoretic generalization bounds typically control the generalization gap through KL-based mutual information and therefore rely on boundedness or sub-Gaussian tails via the moment generating function (MGF). In many modern pipelines, such as robust learning, RLHF, and stochastic optimization, losses and rewards can be heavy-tailed, and MGFs may not exist, rendering KL-based tools ineffective. We develop a tail-dependent information-theoretic framework for sub-Weibull data, where the tail parameter controls the tail heaviness: corresponds to sub-Gaussian, to sub-exponential, and to genuinely heavy tails. Our key technical ingredient is a decorrelation lemma that bounds change-of-measure expectations using a shifted-log -divergence, which admits explicit comparisons to R\'enyi divergence without MGF arguments. On…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
