Efficient Long Sequence Encoding via Synchronization

Xiangyang Mou; Mo Yu; Bingsheng Yao; Lifu Huang

arXiv:2203.07644·cs.CL·March 16, 2022

Efficient Long Sequence Encoding via Synchronization

Xiangyang Mou, Mo Yu, Bingsheng Yao, Lifu Huang

PDF

Open Access

TL;DR

This paper introduces a synchronization mechanism for hierarchical encoding in Transformers, improving long sequence processing efficiency by enhancing global information exchange across segments.

Contribution

It proposes a flexible synchronization framework that identifies anchor tokens and synchronizes their embeddings within Transformer layers, enhancing long sequence encoding.

Findings

01

Improves information exchange among segments in long input sequences.

02

Maintains efficiency while enhancing global context understanding.

03

Effective on tasks like NarrativeQA and HotpotQA.

Abstract

Pre-trained Transformer models have achieved successes in a wide range of NLP tasks, but are inefficient when dealing with long input sequences. Existing studies try to overcome this challenge via segmenting the long sequence followed by hierarchical encoding or post-hoc aggregation. We propose a synchronization mechanism for hierarchical encoding. Our approach first identifies anchor tokens across segments and groups them by their roles in the original input sequence. Then inside Transformer layer, anchor embeddings are synchronized within their group via a self-attention module. Our approach is a general framework with sufficient flexibility -- when adapted to a new task, it is easy to be enhanced with the task-specific anchor definitions. Experiments on two representative tasks with different types of long input texts, NarrativeQA summary setting and wild multi-hop reasoning from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Dense Connections · Residual Connection · Dropout · Layer Normalization · Adam