SepSeq: A Training-Free Framework for Long Numerical Sequence Processing in LLMs
Jie Sun, Yu Liu, Lu Han, Qiwen Deng, Xiang Shu, Yang Xiao, Xingyu Lu, Jun Zhou, Pengfei Liu, Lintao Ma, Jiancan Wu, Xiang Wang

TL;DR
SepSeq is a training-free framework that improves long numerical sequence processing in LLMs by inserting separator tokens to focus attention locally, significantly enhancing accuracy and efficiency.
Contribution
It introduces a novel, training-free method using separator tokens to mitigate attention dispersion in LLMs handling long sequences.
Findings
Average relative accuracy improved by 35.6% across 9 LLMs.
Reduced inference token consumption by 16.4% on average.
Effective across diverse domains.
Abstract
While transformer-based Large Language Models (LLMs) theoretically support massive context windows, they suffer from severe performance degradation when processing long numerical sequences. We attribute this failure to the attention dispersion in the Softmax mechanism, which prevents the model from concentrating attention. To overcome this, we propose Separate Sequence (SepSeq), a training-free, plug-and-play framework to mitigate dispersion by strategically inserting separator tokens. Mechanistically, we demonstrate that separator tokens act as an attention sink, recalibrating attention to focus on local segments while preserving global context. Extensive evaluations on 9 widely-adopted LLMs confirm the effectiveness of our approach: SepSeq yields an average relative accuracy improvement of 35.6% across diverse domains while reducing total inference token consumption by 16.4% on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
