LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding

Junlong Tong; Jinlan Fu; Zixuan Lin; Yingqi Fan; Anhao Zhao; Hui Su; Xiaoyu Shen

arXiv:2505.16983·cs.CL·May 30, 2025

LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding

Junlong Tong, Jinlan Fu, Zixuan Lin, Yingqi Fan, Anhao Zhao, Hui Su, Xiaoyu Shen

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces a group position encoding method that improves the adaptation of large language models from batch to streaming processing, reducing re-encoding needs and enhancing performance across tasks.

Contribution

It provides the first comprehensive analysis of position encoding impacts in streaming LLMs and proposes a novel group position encoding paradigm that outperforms existing methods.

Findings

01

Outperforms existing streaming adaptation methods.

02

Requires no architectural modifications.

03

Shows strong generalization in various tasks.

Abstract

Large Language Models (LLMs) are primarily designed for batch processing. Existing methods for adapting LLMs to streaming rely either on expensive re-encoding or specialized architectures with limited scalability. This work identifies three key mismatches in adapting batch-oriented LLMs to streaming: (1) input-attention, (2) output-attention, and (3) position-ID mismatches. While it is commonly assumed that the latter two mismatches require frequent re-encoding, our analysis reveals that only the input-attention mismatch significantly impacts performance, indicating re-encoding outputs is largely unnecessary. To better understand this discrepancy with the common assumption, we provide the first comprehensive analysis of the impact of position encoding on LLMs in streaming, showing that preserving relative positions within source and target contexts is more critical than maintaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eit-nlp/streamingllm
pytorchOfficial

Models

🤗
JunlongTong/StreamingLLM
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDNA and Biological Computing · Algorithms and Data Compression · Error Correcting Code Techniques