Equipping Transformer with Random-Access Reading for Long-Context   Understanding

Chenghao Yang; Zi Yang; Nan Hua

arXiv:2405.13216·cs.CL·May 24, 2024

Equipping Transformer with Random-Access Reading for Long-Context Understanding

Chenghao Yang, Zi Yang, Nan Hua

PDF

Open Access

TL;DR

This paper introduces a novel random-access reading strategy for transformers, enabling efficient long-document understanding without sequential token processing, addressing computational and length extrapolation challenges.

Contribution

It proposes a new random-access method inspired by human reading, allowing transformers to skip irrelevant tokens and process long texts more efficiently.

Findings

01

Effective long-document processing demonstrated through experiments

02

Reduces computational complexity compared to traditional methods

03

Improves length extrapolation capabilities of transformers

Abstract

Long-context modeling presents a significant challenge for transformer-based large language models (LLMs) due to the quadratic complexity of the self-attention mechanism and issues with length extrapolation caused by pretraining exclusively on short inputs. Existing methods address computational complexity through techniques such as text chunking, the kernel approach, and structured attention, and tackle length extrapolation problems through positional encoding, continued pretraining, and data engineering. These approaches typically require $sequential access$ to the document, necessitating reading from the first to the last token. We contend that for goal-oriented reading of long documents, such sequential access is not necessary, and a proficiently trained model can learn to omit hundreds of less pertinent tokens. Inspired by human reading behaviors and existing empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques