Chunk, Align, Select: A Simple Long-sequence Processing Method for   Transformers

Jiawen Xie; Pengyu Cheng; Xiao Liang; Yong Dai; Nan Du

arXiv:2308.13191·cs.CL·July 8, 2024

Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers

Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, Nan Du

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple, efficient framework for processing long sequences with transformers, reducing computational costs from quadratic to linear, and improving performance on long-text tasks.

Contribution

The authors propose a novel method that divides long sequences into chunks, aligns inter-chunk information, and selects key hidden states using a reinforcement learning-inspired policy.

Findings

01

Improved long-text summarization performance.

02

Reduced computational complexity from quadratic to linear.

03

Effective inter-chunk semantic alignment.

Abstract

Although dominant in natural language processing, transformer-based models remain challenged by the task of long-sequence processing, because the computational cost of self-attention operations in transformers swells quadratically with the input sequence length. To alleviate the complexity of long-sequence processing, we propose a simple framework to enable the offthe-shelf pre-trained transformers to process much longer sequences, while the computation and memory costs remain growing linearly with the input sequence lengths. More specifically, our method divides each long-sequence input into a batch of chunks, then aligns the interchunk information during the encoding steps, and finally selects the most representative hidden states from the encoder for the decoding process. To extract inter-chunk semantic information, we align the start and end token embeddings among chunks in each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xjw-nlp/simcas
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsALIGN