FBS: Modeling Native Parallel Reading inside a Transformer

Tongxi Wang

arXiv:2601.21708·cs.AI·April 9, 2026

FBS: Modeling Native Parallel Reading inside a Transformer

Tongxi Wang

PDF

TL;DR

The paper introduces FBS, a novel Transformer architecture that models native parallel reading by incorporating human-like reading ingredients, improving efficiency and quality without extra parameters.

Contribution

It proposes the FBS model with Parafovea-Attention, Chunk-Head, and Skip-Gate modules to enhance Transformer reading capabilities, inspired by human reading strategies.

Findings

01

FBS improves the quality-efficiency trade-off across benchmarks.

02

The three modules in FBS are complementary and contribute to performance gains.

03

FBS does not increase the number of parameters while enhancing reading efficiency.

Abstract

Large language models (LLMs) excel across many tasks, yet inference is still dominated by strictly token-by-token autoregression. Existing acceleration methods largely patch this pipeline and miss core human-reading ingredients: content-adaptive foresight, chunk-structure-aware compute allocation, and train-test consistency for preview/skimming. We propose the Fovea-Block-Skip Transformer (FBS), which injects a causal, trainable loop into Transformers via Parafovea-Attention Window (PAW), Chunk-Head (CH), and Skip-Gate (SG). Across diverse benchmarks, FBS improves the quality-efficiency trade-off without increasing parameters, and ablations show the three modules are complementary.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.