FIRST: Teach A Reliable Large Language Model Through Efficient Trustworthy Distillation
KaShun Shum, Minrui Xu, Jianshu Zhang, Zixin Chen, Shizhe Diao, Hanze, Dong, Jipeng Zhang, Muhammad Omer Raza

TL;DR
This paper introduces FIRST, a new distillation method that improves the trustworthiness of large language models by reducing mis-calibration and enhancing accuracy efficiently using concentrated knowledge transfer.
Contribution
We propose a novel distillation approach called FIRST that leverages concentrated knowledge and trustworthy maximization to produce reliable, well-calibrated language models more efficiently.
Findings
Achieves +2.3% accuracy improvement
Reduces mis-calibration by 10%
Effective in both in-domain and out-of-domain scenarios
Abstract
Large language models (LLMs) have become increasingly prevalent in our daily lives, leading to an expectation for LLMs to be trustworthy -- - both accurate and well-calibrated (the prediction confidence should align with its ground truth correctness likelihood). Nowadays, fine-tuning has become the most popular method for adapting a model to practical usage by significantly increasing accuracy on downstream tasks. Despite the great accuracy it achieves, we found fine-tuning is still far away from satisfactory trustworthiness due to "tuning-induced mis-calibration". In this paper, we delve deeply into why and how mis-calibration exists in fine-tuned models, and how distillation can alleviate the issue. Then we further propose a brand new method named Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher's knowledge to obtain a reliable language model in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling
MethodsALIGN
