A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

Wai Man Si; Mingjie Li; Michael Backes; Yang Zhang

arXiv:2604.15789·cs.CL·April 20, 2026

A Systematic Study of Training-Free Methods for Trustworthy Large Language Models

Wai Man Si, Mingjie Li, Michael Backes, Yang Zhang

PDF

TL;DR

This paper systematically evaluates training-free methods for improving trustworthiness in large language models, analyzing their effectiveness, trade-offs, and limitations across various settings and model types.

Contribution

It introduces a taxonomy of training-free methods based on their intervention points and provides a comprehensive analysis of their impacts on trustworthiness, utility, and robustness.

Findings

01

Training-free methods vary in effectiveness across trustworthiness dimensions.

02

Trade-offs exist between trustworthiness improvements and utility degradation.

03

The study identifies unresolved challenges and offers practical recommendations.

Abstract

As Large Language Models (LLMs) receive increasing attention and are being deployed across various domains, their potential risks, including generating harmful or biased content, producing unsupported claims, and exhibiting vulnerabilities to adversarial attacks, have drawn significant attention. To enable quick and low-cost adaptation, training-free methods have recently emerged as cost-effective alternatives to post-training alignment techniques. Despite their promising results, these methods are evaluated inconsistently across the literature, cover limited dimensions of trustworthiness, and can introduce undesirable side effects, such as utility degradation and increased brittleness. To fully assess the impacts of these training-free methods, we take a step back and systematically re-evaluate the effectiveness of existing training-free methods against various trustworthy settings and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.