DRetHTR: Linear-Time Decoder-Only Retentive Network for Handwritten Text Recognition
Changhun Kim, Martin Mayr, Thomas Gorges, Fei Wu, Mathias Seuret, Andreas Maier, Vincent Christlein

TL;DR
The paper introduces DRetHTR, a decoder-only Retentive Network for handwritten text recognition that achieves faster inference and lower memory usage than Transformers, while maintaining high accuracy by innovative retention mechanisms and layer-wise scaling.
Contribution
It proposes a novel decoder-only Retentive Network architecture with softmax-free retention and layer-wise gamma scaling, enabling linear-time decoding and improved efficiency in handwritten text recognition.
Findings
Achieves 1.6-1.9x faster inference than Transformer baselines.
Reduces memory usage by 38-42% without accuracy loss.
Attains state-of-the-art character error rates on multiple datasets.
Abstract
State-of-the-art handwritten text recognition (HTR) systems commonly use Transformers, whose growing key-value (KV) cache makes decoding slow and memory-intensive. We introduce DRetHTR, a decoder-only model built on Retentive Networks (RetNet). Compared to an equally sized decoder-only Transformer baseline, DRetHTR delivers 1.6-1.9x faster inference with 38-42% less memory usage, without loss of accuracy. By replacing softmax attention with softmax-free retention and injecting multi-scale sequential priors, DRetHTR avoids a growing KV cache: decoding is linear in output length in both time and memory. To recover the local-to-global inductive bias of attention, we propose layer-wise gamma scaling, which progressively enlarges the effective retention horizon in deeper layers. This encourages early layers to model short-range dependencies and later layers to capture broader context,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Neural Network Applications · Topic Modeling
