Temporal Guidance for Large Language Models
Hong-Kai Zheng, Piji Li

TL;DR
This paper introduces Temporal Guidance (TeGu), a novel contrastive decoding strategy for large language models that improves generation quality efficiently by leveraging local temporal preferences and a lightweight Multi-Token Prediction mechanism.
Contribution
The paper proposes TeGu, a new contrastive guidance method along the temporal dimension that enhances LLM output quality with minimal additional computation and memory.
Findings
TeGu outperforms existing methods on various benchmarks.
It maintains low computational overhead while improving generation quality.
TeGu is effective across different model sizes.
Abstract
Contrastive Decoding (CD) enhances the generation quality of large language models (LLMs) but incurs significant additional computational overhead due to the need for an auxiliary model. Existing internal self-contrastive decoding methods, such as Decoding by Contrasting Layers (DoLa), focus on discrepancies across different layers, which are notably unstable on small-scale models. In this work, based on the observation that LLMs exhibit local preferences, we propose a novel contrastive guidance strategy along the temporal dimension, namely Temporal Guidance (TeGu). Our method ingeniously leverages Multi-Token Prediction (MTP) to construct weaker amateur predictions for model self-contrast. To standardize the implementation of this mechanism, we further introduce a lightweight Conditional MTP Projector (cMTPP), which avoids maintaining multiple independent networks as required by other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
