Extending Token Computation for LLM Reasoning

Bingli Liao; Danilo Vasconcellos Vargas

arXiv:2403.14932·cs.CL·June 25, 2024·1 cites

Extending Token Computation for LLM Reasoning

Bingli Liao, Danilo Vasconcellos Vargas

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method to extend token computation in LLMs by optimizing attention mechanisms, significantly improving reasoning performance especially in non-STEM domains.

Contribution

It presents a new algorithm for re-balancing attention distributions in LLMs, enhancing reasoning by emulating early layer attention patterns across layers.

Findings

01

Improved reasoning performance in non-STEM domains.

02

Enhanced understanding of LLM internal attention dynamics.

03

Significant gains through attention re-balancing algorithm.

Abstract

Large Language Models (LLMs) are pivotal in advancing natural language processing but often struggle with complex reasoning tasks due to inefficient attention distributions. In this paper, we explore the effect of increased computed tokens on LLM performance and introduce a novel method for extending computed tokens in the Chain-of-Thought (CoT) process, utilizing attention mechanism optimization. By fine-tuning an LLM on a domain-specific, highly structured dataset, we analyze attention patterns across layers, identifying inefficiencies caused by non-semantic tokens with outlier high attention scores. To address this, we propose an algorithm that emulates early layer attention patterns across downstream layers to re-balance skewed attention distributions and enhance knowledge abstraction. Our findings demonstrate that our approach not only facilitates a deeper understanding of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

metacarbon/attentionReasoning-llm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques