Unveiling the Basin-Like Loss Landscape in Large Language Models
Huanran Chen, Yinpeng Dong, Zeming Wei, Yao Huang, Yichi Zhang, Hang Su, Jun Zhu

TL;DR
This paper uncovers basin-like structures in the loss landscape of large language models, showing how model scale and fine-tuning influence stability, capabilities, and robustness, with implications for improving model training and safety.
Contribution
It introduces the concept of basins in LLM loss landscapes, linking model scale, fine-tuning, and robustness, and provides theoretical bounds on performance degradation.
Findings
Larger models have more expansive stability basins.
Benign fine-tuning within basins preserves capabilities.
Adversarial fine-tuning exploits worst-case directions, degrading performance.
Abstract
We discover the emergence of \textit{basins} in the loss landscape of large language models. As model scale increases, LLMs become progressively more resilient to random perturbations in the parameter space, giving rise to expansive stability regions where models exhibit nearly identical performance, but outside of which their capabilities collapse. We observe that pre-training creates a \textit{basic capability} basin, and subsequent alignment fine-tuning forms \textit{specific capability} basins (e.g., safety, math, coding). Thus, we argue that benign fine-tuning confined to the basin should preserve prior capabilities. Besides, we also analyze the loss landscape for worst-case directions, which is consistently sharp and detrimental. We find that adversarial fine-tuning moves along the nearly worst-case directions, thus rapidly degrading model capabilities. Finally, we provide a…
Peer Reviews
Decision·ICLR 2026 Poster
- The analysis of the local loss geometry of LLMs is sound. It might lead to a deeper theoretical understanding of LLM training through flatness [cf 6, 7]. - The partitioning in most-case and worst-case loss surface is interesting and novel. - The empirical analysis is comprehensive. The normalization across heterogeneous experiments enables consistent comparison across diverse generative tasks and is a nontrivial engineering effort. - The GA-optimizer is a tangible output of the paper. - The p
- The fact that basins only occur for the 0-1-loss and not for likelihoods. This could hint at basins being a byproduct of thresholding, rather than a genuine property of the loss surface. The authors are open about this limitation, though, so I do not see this as a reason for rejection. - Averaging over many samples of 1D slices is reasonable to obtain a big picture, but it would be interesting to look at deviations, e.g., by displaying variance of the basin. It could be, after all, that the ba
- It's an interesting finding about the different loss landscape patterns and the relation to catastrophic forgetting - Theoretical analysis showed the basin size bounds the performance degradation of any fine-tuning.
- The concept of landscape is intuitive but lack some rigorous definitions. For example, "most-case landscape" and "worst-case landscape" we only have qualitative definition but no quantitative definition. - The connection between this loss landscape and other research topic is unclear, e.g. how do we put the "saddle point" concept into this framework? - In high dimension parameter space, the possible "direction" is actually infinite. In this work, only a few finetune direction is tested.
- I appreciate the perspective that the authors take, flatness seems a powerful tool for the analysis of learning behaviour of LLMs - I appreciate the introduction of the Gaussian-augmented optimizer.
- The paper remains very high level, primarily reporting the finding of 'basins', but lacks sufficiently convincing and rigorous formal and empirical analyses. - The authors report that different models (eg. Llama, Qwen, Mistral) have different basin sizes and conjecture this could mean certain of these models are more prone to comprimising safety when fine-tuned, without providing clear solid reasoning or actual evidence. - The authors report that substituting some tokens preserves performance
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
