An Empirical Study of Sample Selection Strategies for Large Language Model Repair
Xuran Li, Jingyi Wang

TL;DR
This study systematically compares sample selection strategies for repairing large language models, finding that semantic-aware prioritization offers the best balance of effectiveness and efficiency, with simpler methods often sufficing for large models.
Contribution
It introduces and evaluates five sample selection methods for LLM repair, highlighting the effectiveness of a new semantic-aware approach and providing insights into optimal data proportions and trade-offs.
Findings
SAPS achieves superior balance of detoxification and utility preservation.
Random sampling is effective for large or robust models.
High-overhead methods like CCS and GraNd offer limited benefits.
Abstract
Large language models (LLMs) are increasingly deployed in real-world systems, yet they can produce toxic or biased outputs that undermine safety and trust. Post-hoc model repair provides a practical remedy, but the high cost of parameter updates motivates selective use of repair data. Despite extensive prior work on data selection for model training, it remains unclear which sampling criteria are most effective and efficient when applied specifically to behavioral repair of large generative models. Our study presents a systematic analysis of sample prioritization strategies for LLM repair. We evaluate five representative selection methods, including random sampling, K-Center, gradient-norm-based selection(GraNd), stratified coverage (CCS), and a Semantic-Aware Prioritized Sampling (SAPS) approach we proposed. Repair effectiveness and trade-offs are assessed through toxicity reduction,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Adversarial Robustness in Machine Learning
