TL;DR
This paper introduces SIFT, a novel method that improves large language model reasoning accuracy by generating and refining explicit key information stickers during inference, leading to state-of-the-art results.
Contribution
SIFT is a post-training approach that uses self-generated stickers to explicitly emphasize key context information, enhancing reasoning fidelity in LLMs.
Findings
SIFT improves accuracy on benchmarks like GSM8K and MATH-500.
SIFT achieves a new state-of-the-art pass@1 accuracy of 85.67% on AIME2024.
SIFT demonstrates consistent performance gains across models from 3B to 100B+.
Abstract
This paper identifies the misinterpretation of the context can be a significant issue during the reasoning process of large language models, spanning from smaller models like Llama3.2-3B-Instruct to cutting-edge ones like DeepSeek-R1. For example, in the phrase "10 dollars per kilo," LLMs might not recognize that "per" means "for each," leading to calculation errors. We introduce a novel, post-training approach called **Stick to the Facts (SIFT)** to tackle this. SIFT leverages increasing inference-time compute to ground LLM reasoning in contexts. At the core of SIFT lies the *Sticker*, which is generated by the model itself to explicitly emphasize the key information within the context. Given the curated Sticker, SIFT generates two predictions -- one from the original query and one from the query augmented with the Sticker. If they differ, the Sticker is sequentially refined via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsALIGN
