InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write
Blagoj Mitrevski, Arina Rak, Julian Schnitzler, Chengkun Li, Andrii Maksai, Jesse Berent, Claudiu Musat

TL;DR
InkSight introduces a novel vision-language model that converts offline handwritten notes into online digital ink, effectively bridging the gap between traditional pen-and-paper and digital note-taking with high accuracy and generalization.
Contribution
This work is the first to successfully derender handwritten text in diverse photos and sketches using a combined reading and writing prior, without requiring large paired datasets.
Findings
87% of outputs are valid tracings on HierText dataset
67% of outputs resemble human pen trajectories
Effective generalization beyond training domain
Abstract
Digital note-taking is gaining popularity, offering a durable, editable, and easily indexable way of storing notes in a vectorized form, known as digital ink. However, a substantial gap remains between this way of note-taking and traditional pen-and-paper note-taking, a practice that is still favored by a vast majority. Our work InkSight, aims to bridge the gap by empowering physical note-takers to effortlessly convert their work (offline handwriting) to digital ink (online handwriting), a process we refer to as derendering. Prior research on the topic has focused on the geometric properties of images, resulting in limited generalization beyond their training domains. Our approach combines reading and writing priors, allowing training a model in the absence of large amounts of paired samples, which are difficult to obtain. To our knowledge, this is the first work that effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Human Motion and Animation · Natural Language Processing Techniques
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · SentencePiece · Attention Dropout · Linear Layer · Inverse Square Root Schedule · Gated Linear Unit · Multi-Head Attention · Residual Connection
