ScaleFormer: Span Representation Cumulation for Long-Context Transformer
Jiangshu Du, Wenpeng Yin, Philip Yu

TL;DR
ScaleFormer is a plug-and-play framework that enables pre-trained Transformer models to effectively process long documents by segmenting inputs and using a novel fusion mechanism to incorporate contextual information, improving long-form reasoning.
Contribution
It introduces a simple, parameter-free fusion method that adapts existing models for long-context tasks without architectural changes or retraining.
Findings
Achieves linear complexity in processing long sequences
Outperforms state-of-the-art methods in long-document summarization
Enables effective long-form reasoning with pre-trained models
Abstract
The quadratic complexity of standard self-attention severely limits the application of Transformer-based models to long-context tasks. While efficient Transformer variants exist, they often require architectural changes and costly pre-training from scratch. To circumvent this, we propose ScaleFormer(Span Representation Cumulation for Long-Context Transformer) - a simple and effective plug-and-play framework that adapts off-the-shelf pre-trained encoder-decoder models to process long sequences without requiring architectural modifications. Our approach segments long inputs into overlapping chunks and generates a compressed, context-aware representation for the decoder. The core of our method is a novel, parameter-free fusion mechanism that endows each chunk's representation with structural awareness of its position within the document. It achieves this by enriching each chunk's boundary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
