Predicting the Original Appearance of Damaged Historical Documents
Zhenhua Yang, Dezhi Peng, Yongxin Shi, Yuyi Zhang, Chongyu Liu and, Lianwen Jin

TL;DR
This paper introduces a new task called Historical Document Repair (HDR), proposing a large dataset HDR28K and a diffusion-based network DiffHDR to restore damaged historical documents, significantly advancing the field.
Contribution
The paper presents HDR28K dataset and DiffHDR model, pioneering a new approach for repairing damaged historical documents with high accuracy and flexibility.
Findings
DiffHDR outperforms existing methods in repairing damaged documents.
HDR28K dataset enables robust training and evaluation.
DiffHDR can be extended to document editing and text generation.
Abstract
Historical documents encompass a wealth of cultural treasures but suffer from severe damages including character missing, paper damage, and ink erosion over time. However, existing document processing methods primarily focus on binarization, enhancement, etc., neglecting the repair of these damages. To this end, we present a new task, termed Historical Document Repair (HDR), which aims to predict the original appearance of damaged historical documents. To fill the gap in this field, we propose a large-scale dataset HDR28K and a diffusion-based network DiffHDR for historical document repair. Specifically, HDR28K contains 28,552 damaged-repaired image pairs with character-level annotations and multi-style degradations. Moreover, DiffHDR augments the vanilla diffusion framework with semantic and spatial information and a meticulously designed character perceptual loss for contextual and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques
MethodsDiffusion · Focus
