Few-shot Writer Adaptation via Multimodal In-Context Learning

Tom Simon; Stephane Nicolas; Pierrick Tranouez; Clement Chatelain; Thierry Paquet

arXiv:2603.29450·cs.CV·April 1, 2026

Few-shot Writer Adaptation via Multimodal In-Context Learning

Tom Simon, Stephane Nicolas, Pierrick Tranouez, Clement Chatelain, Thierry Paquet

PDF

TL;DR

This paper introduces a novel multimodal in-context learning framework for handwritten text recognition that enables writer-specific adaptation during inference without parameter updates, improving accuracy on standard benchmarks.

Contribution

It proposes a new context-driven HTR method using few examples for writer adaptation without fine-tuning, and designs an efficient 8M-parameter CNN-Transformer model.

Findings

01

Achieves Character Error Rates of 3.92% on IAM and 2.34% on RIMES.

02

Outperforms all writer-independent HTR models without parameter updates.

03

Demonstrates the effectiveness of context length and combined training strategies.

Abstract

While state-of-the-art Handwritten Text Recognition (HTR) models perform well on standard benchmarks, they frequently struggle with writers exhibiting highly specific styles that are underrepresented in the training data. To handle unseen and atypical writers, writer adaptation techniques personalize HTR models to individual handwriting styles. Leading writer adaptation methods require either offline fine-tuning or parameter updates at inference time, both involving gradient computation and backpropagation, which increase computational costs and demand careful hyperparameter tuning. In this work, we propose a novel context-driven HTR framework3 inspired by multimodal in-context learning, enabling inference-time writer adaptation using only a few examples from the target writer without any parameter updates. We further demonstrate the impact of context length, design a compact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.