Unused information in token probability distribution of generative LLM: improving LLM reading comprehension through calculation of expected values
Krystian Zawistowski

TL;DR
This paper shows that manipulating token probabilities and calculating expected values can significantly improve LLM reading comprehension and decoding quality, outperforming some existing models on specific metrics.
Contribution
It introduces a method of using expected token values and probability-based sampling to enhance LLM decoding and comprehension performance.
Findings
Improved correlation with human judgment on SummEval dataset.
Scaling logits with temperature increases entropy and decoding quality.
Probability-based tree sampling explores multiple likely generations.
Abstract
LLM text decoding is key component for perceived LLM quality. We demonstrate two experiments showing that decoding methods could be improved by manipulation of token probabilities. First, we test few LLM on SummEval summary scoring dataset, to measure reading comprehension. We compare scores from greedy decoding to expected values over the next token distribution. We scale logits by large temperature to increase the entropy of scores. This allows strong improvement of performance on SummEval (in terms of correlations to human judgement). We see improvement from 6-8% to 13-28% for 7B Mistral and from 20%-46% to 37%-56% for Mixtral, beating GPT 4 0314 result on two metrics. Part of the gain seems related to positional bias. Secondly, we use probability-based tree sampling algorithm, to examine all most probable generations for given prompt.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification
MethodsAttention Is All You Need · Linear Layer · Cosine Annealing · Multi-Head Attention · Weight Decay · Linear Warmup With Cosine Annealing · Adam · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding
