FragileFlow: Spectral Control of Correct-but-Fragile Predictions for Foundation Model Robustness
Zhuoyun Li, Boxuan Wang, Jinwei Hu, Xiaowei Huang, Yi Dong

TL;DR
FragileFlow is a novel regularizer that improves foundation model robustness by controlling spectral properties and identifying predictions vulnerable to systematic errors near decision boundaries.
Contribution
It introduces a formal margin-aware error flow concept, a spectral regularizer, and provides the first PAC-Bayes bound for this robustness measure.
Findings
FragileFlow improves worst-class accuracy in multiple benchmarks.
It preserves clean accuracy while enhancing robustness.
Theoretical bounds support the effectiveness of spectral control.
Abstract
Robust adaptation of LLMs and VLMs is often evaluated by average accuracy or average consistency under perturbations. However, these averages can hide a structured failure mode: a prediction may remain correct while probability mass already flows from particular true classes toward systematic wrong competitors near the decision boundary. In this paper, we formalize this phenomenon as margin-aware error flow and introduce FragileFlow, a plug-in regularizer that uses a calibrated margin buffer to identify correct-but-fragile predictions and organize their off-class probability mass into a class-wise vulnerable-risk matrix. Theoretically, we provide the first PAC-Bayes upper bound for this margin-aware error-flow object, showing how empirical spectral control yields a conservative route to deterministic worst-class robustness under a stability condition. Experiments on multiple-choice LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
