Equipping Transformer with Random-Access Reading for Long-Context Understanding
Chenghao Yang, Zi Yang, Nan Hua

TL;DR
This paper introduces a novel random-access reading strategy for transformers, enabling efficient long-document understanding without sequential token processing, addressing computational and length extrapolation challenges.
Contribution
It proposes a new random-access method inspired by human reading, allowing transformers to skip irrelevant tokens and process long texts more efficiently.
Findings
Effective long-document processing demonstrated through experiments
Reduces computational complexity compared to traditional methods
Improves length extrapolation capabilities of transformers
Abstract
Long-context modeling presents a significant challenge for transformer-based large language models (LLMs) due to the quadratic complexity of the self-attention mechanism and issues with length extrapolation caused by pretraining exclusively on short inputs. Existing methods address computational complexity through techniques such as text chunking, the kernel approach, and structured attention, and tackle length extrapolation problems through positional encoding, continued pretraining, and data engineering. These approaches typically require to the document, necessitating reading from the first to the last token. We contend that for goal-oriented reading of long documents, such sequential access is not necessary, and a proficiently trained model can learn to omit hundreds of less pertinent tokens. Inspired by human reading behaviors and existing empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
