UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs

Wenhao Li; Mingbao Lin; Yunshan Zhong; Shuicheng Yan; Rongrong Ji

arXiv:2406.18173·cs.CL·September 15, 2025

UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs

Wenhao Li, Mingbao Lin, Yunshan Zhong, Shuicheng Yan, Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

UIO-LLMs introduces an unbiased incremental optimization method for long-context LLMs, significantly extending context windows with minimal parameter increase and efficient inference, addressing bias and complexity issues.

Contribution

The paper presents a novel unbiased incremental optimization approach for memory-enhanced transformers, enabling efficient long-context processing with minimal parameter overhead.

Findings

01

Extended Llama2-7b-chat context from 4K to 100K tokens

02

Achieved near-linear inference cost with minimal parameter increase

03

Reduced training time complexity through incremental optimization

Abstract

Managing long texts is challenging for large language models (LLMs) due to limited context window sizes. This study introduces UIO-LLMs, an unbiased incremental optimization approach for memory-enhanced transformers under long-context settings. We initially conceptualize the process as a streamlined encoder-decoder framework where the weights-shared encoder and decoder respectively encapsulate a context segment into memories and leverage these memories to predict outputs of the subsequent segment. Subsequently, by treating our memory-enhanced transformers as fully-connected recurrent neural networks (RNNs), we refine the training process using the Truncated Backpropagation Through Time (TBPTT) algorithm, which incorporates innovative incremental optimization techniques. These techniques not only diminish time complexity but also address the bias in gradient computation through an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenhaoli-xmu/UIO-LLMs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies