Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher
Hyunjong Ok, Jegwang Ryu, Jaeho Lee

TL;DR
This paper introduces an adaptive decoding algorithm for small-scale language models that effectively leverages limited supervision from larger models by dynamically trusting or disregarding their predictions based on confidence, improving generation quality.
Contribution
It proposes a novel method for aggregating predictions from small and large language models under limited supervision, emphasizing adaptive trust based on confidence levels.
Findings
Consistent improvement over traditional decoding methods
Effective aggregation of small and large model predictions
Adaptive trust mechanism enhances generation quality
Abstract
How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality? This question has been well studied in scenarios where there is no restriction on the number of LLM supervisions one can use, giving birth to many decoding algorithms that utilize supervision without further training. However, it is still unclear what is an effective strategy under the scenario, where we assume that no more than a few tokens can be generated by LLMs. To this end, we develop an algorithm to effectively aggregate the small-scale LLM and LLM predictions on initial tokens so that the generated tokens can more accurately condition the subsequent token generation by small-scale LLM only. Critically, we find that it is essential to adaptively overtrust or disregard the LLM prediction based on the confidence of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational and Psychological Assessments
