FBS: Modeling Native Parallel Reading inside a Transformer
Tongxi Wang

TL;DR
The paper introduces FBS, a novel Transformer architecture that models native parallel reading by incorporating human-like reading ingredients, improving efficiency and quality without extra parameters.
Contribution
It proposes the FBS model with Parafovea-Attention, Chunk-Head, and Skip-Gate modules to enhance Transformer reading capabilities, inspired by human reading strategies.
Findings
FBS improves the quality-efficiency trade-off across benchmarks.
The three modules in FBS are complementary and contribute to performance gains.
FBS does not increase the number of parameters while enhancing reading efficiency.
Abstract
Large language models (LLMs) excel across many tasks, yet inference is still dominated by strictly token-by-token autoregression. Existing acceleration methods largely patch this pipeline and miss core human-reading ingredients: content-adaptive foresight, chunk-structure-aware compute allocation, and train-test consistency for preview/skimming. We propose the Fovea-Block-Skip Transformer (FBS), which injects a causal, trainable loop into Transformers via Parafovea-Attention Window (PAW), Chunk-Head (CH), and Skip-Gate (SG). Across diverse benchmarks, FBS improves the quality-efficiency trade-off without increasing parameters, and ablations show the three modules are complementary.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
