Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers
Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, Nan Du

TL;DR
This paper introduces a simple, efficient framework for processing long sequences with transformers, reducing computational costs from quadratic to linear, and improving performance on long-text tasks.
Contribution
The authors propose a novel method that divides long sequences into chunks, aligns inter-chunk information, and selects key hidden states using a reinforcement learning-inspired policy.
Findings
Improved long-text summarization performance.
Reduced computational complexity from quadratic to linear.
Effective inter-chunk semantic alignment.
Abstract
Although dominant in natural language processing, transformer-based models remain challenged by the task of long-sequence processing, because the computational cost of self-attention operations in transformers swells quadratically with the input sequence length. To alleviate the complexity of long-sequence processing, we propose a simple framework to enable the offthe-shelf pre-trained transformers to process much longer sequences, while the computation and memory costs remain growing linearly with the input sequence lengths. More specifically, our method divides each long-sequence input into a batch of chunks, then aligns the interchunk information during the encoding steps, and finally selects the most representative hidden states from the encoder for the decoding process. To extract inter-chunk semantic information, we align the start and end token embeddings among chunks in each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsALIGN
