Statistical Hypothesis Testing for Auditing Robustness in Language Models
Paulius Rauba, Qiyao Wei, Mihaela van der Schaar

TL;DR
This paper introduces a distribution-based hypothesis testing framework for auditing large language models, enabling robust evaluation of output changes under various perturbations with interpretable statistical measures.
Contribution
It presents a novel, model-agnostic hypothesis testing method that quantifies LLM output changes, supporting arbitrary perturbations and providing interpretable p-values and effect sizes.
Findings
Framework supports multiple perturbations with controlled error rates
Enables quantification of response changes and true/false positive rates
Demonstrates effectiveness across multiple case studies
Abstract
Consider the problem of testing whether the outputs of a large language model (LLM) system change under an arbitrary intervention, such as an input perturbation or changing the model variant. We cannot simply compare two LLM outputs since they might differ due to the stochastic nature of the system, nor can we compare the entire output distribution due to computational intractability. While existing methods for analyzing text-based outputs exist, they focus on fundamentally different problems, such as measuring bias or fairness. To this end, we introduce distribution-based perturbation analysis, a framework that reformulates LLM perturbation analysis as a frequentist hypothesis testing problem. We construct empirical null and alternative output distributions within a low-dimensional semantic similarity space via Monte Carlo sampling, enabling tractable inference without restrictive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods
MethodsFocus
