Few-shot Writer Adaptation via Multimodal In-Context Learning
Tom Simon, Stephane Nicolas, Pierrick Tranouez, Clement Chatelain, Thierry Paquet

TL;DR
This paper introduces a novel multimodal in-context learning framework for handwritten text recognition that enables writer-specific adaptation during inference without parameter updates, improving accuracy on standard benchmarks.
Contribution
It proposes a new context-driven HTR method using few examples for writer adaptation without fine-tuning, and designs an efficient 8M-parameter CNN-Transformer model.
Findings
Achieves Character Error Rates of 3.92% on IAM and 2.34% on RIMES.
Outperforms all writer-independent HTR models without parameter updates.
Demonstrates the effectiveness of context length and combined training strategies.
Abstract
While state-of-the-art Handwritten Text Recognition (HTR) models perform well on standard benchmarks, they frequently struggle with writers exhibiting highly specific styles that are underrepresented in the training data. To handle unseen and atypical writers, writer adaptation techniques personalize HTR models to individual handwriting styles. Leading writer adaptation methods require either offline fine-tuning or parameter updates at inference time, both involving gradient computation and backpropagation, which increase computational costs and demand careful hyperparameter tuning. In this work, we propose a novel context-driven HTR framework3 inspired by multimodal in-context learning, enabling inference-time writer adaptation using only a few examples from the target writer without any parameter updates. We further demonstrate the impact of context length, design a compact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
