Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts

Samandar Samandarov; Nazirjon Ismoiljonov; Abdullah Sattorov; Temirlan Sabyrbayev

arXiv:2603.05276·cs.LG·March 6, 2026

Whispering to a Blackbox: Bootstrapping Frozen OCR with Visual Prompts

Samandar Samandarov, Nazirjon Ismoiljonov, Abdullah Sattorov, Temirlan Sabyrbayev

PDF

Open Access

TL;DR

This paper presents Whisperer, a visual prompting framework that enhances frozen OCR models by learning diffusion-based preprocessors, achieving significant error rate reductions through a sample-efficient, behavioral cloning approach that amplifies random improvements.

Contribution

Introduces a novel diffusion-based visual prompting method that improves frozen OCR performance via a four-stage behavioral cloning curriculum, avoiding reinforcement learning pitfalls.

Findings

01

Achieves 8% absolute reduction in CER on synthetic degraded text images.

02

Surpasses hand-engineered preprocessing baselines like CLAHE.

03

Uses a sample-efficient four-stage training process over 60 GPU-hours.

Abstract

In the landscape of modern machine learning, frozen pre-trained models provide stability and efficiency but often underperform on specific tasks due to mismatched data distributions. This paper introduces the Whisperer, a novel visual prompting framework that learns diffusion-based preprocessors to adapt inputs in pixel space, effectively "whispering" enhancements to frozen downstream models like EasyOCR. By framing the process as behavioral cloning of stochastically discovered improvement policies, our method achieves an 8% absolute (10.6% relative) reduction in Character Error Rate (CER) on a challenging dataset of 300k degraded synthetic text images, surpassing hand-engineered baselines such as CLAHE. The key innovation is a four-stage training curriculum that uses behavioral cloning to amplify "lucky" improvements discovered through the stochastic exploration of a partially trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning