Pointer: Linear-Complexity Long-Range Modeling without Pre-training

Zixi Li

arXiv:2508.02631·cs.CL·August 5, 2025

Pointer: Linear-Complexity Long-Range Modeling without Pre-training

Zixi Li

PDF

Open Access

TL;DR

Pointer introduces a linear-complexity architecture for long-range sequence modeling that outperforms standard transformers in speed and maintains high accuracy without pre-training, using explicit pointer chains for dependencies.

Contribution

It proposes a novel pointer-based architecture achieving linear complexity for long-range modeling without pre-training, with interpretable pointer patterns and significant speed improvements.

Findings

01

2-10x speedup on long sequences

02

>95% accuracy on copy tasks up to 2048 tokens

03

Learned pointer patterns reveal structured dependencies

Abstract

We introduce Pointer, a novel architecture that achieves linear $O (N K)$ complexity for long-range sequence modeling while maintaining superior performance without requiring pre-training. Unlike standard attention mechanisms that compute $O (N^{2})$ pairwise interactions, our approach uses layer-wise pointer chaining where each layer's pointer selection depends on previous layer's pointer positions, creating explicit long-distance connections through pointer chains. We demonstrate that this architecture achieves $2$ -- $10 \times$ speedup on long sequences compared to standard transformers, maintains $> 95%$ accuracy on copy tasks at distances up to 2048 tokens, and learns interpretable pointer patterns that reveal structured dependency modeling. Our experiments on efficiency benchmarks, long-range dependency tasks, and interpretability analysis show that Pointer offers a compelling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Machine Learning and Data Classification