Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
Kun Zhang, Le Wu, Kui Yu, Guangyi Lv, Dacao Zhang

TL;DR
This survey comprehensively reviews the robustness of Large Language Models, covering adversarial and out-of-distribution challenges, evaluation methods, and future research directions to enhance their reliability in diverse applications.
Contribution
It provides a formal definition of LLM robustness, organizes existing work by input perturbation types, and highlights future research opportunities in the field.
Findings
Organized robustness categories: adversarial, OOD, evaluation
Summarized new datasets and metrics for robustness assessment
Highlighted future directions for improving LLM reliability
Abstract
Large Language Models (LLMs) have gained enormous attention in recent years due to their capability of understanding and generating natural languages. With the rapid development and wild-range applications (e.g., Agents, Embodied Intelligence), the robustness of LLMs has received increased attention. As the core brain of many AI applications, the robustness of LLMs requires that models should not only generate consistent contents, but also ensure the correctness and stability of generated content when dealing with unexpeted application scenarios (e.g., toxic prompts, limited noise domain data, outof-distribution (OOD) applications, etc). In this survey paper, we conduct a thorough review of the robustness of LLMs, aiming to provide a comprehensive terminology of concepts and methods around this field and facilitate the community. Specifically, we first give a formal definition of LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
