Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration

Yuyi Zhang; Peirong Zhang; Zhenhua Yang; Pengyu Yan; Yongxin Shi; Pengwei Liu; Fengjun Guo; Lianwen Jin

arXiv:2507.05108·cs.CV·July 22, 2025

Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration

Yuyi Zhang, Peirong Zhang, Zhenhua Yang, Pengyu Yan, Yongxin Shi, Pengwei Liu, Fengjun Guo, Lianwen Jin

PDF

1 Repo 1 Video

TL;DR

This paper introduces a comprehensive dataset and an innovative automated method for restoring severely damaged historical documents, significantly improving OCR accuracy and enabling better preservation of cultural heritage.

Contribution

It presents a new full-page HDR dataset and a three-stage restoration approach that combines automation with human collaboration, advancing the field of historical document restoration.

Findings

01

OCR accuracy improved from 46.83% to 84.05% with AutoHDR

02

Further enhancement to 94.25% through human-machine collaboration

03

AutoHDR outperforms existing methods in restoring severely damaged documents

Abstract

Historical documents represent an invaluable cultural heritage, yet have undergone significant degradation over time through tears, water erosion, and oxidation. Existing Historical Document Restoration (HDR) methods primarily focus on single modality or limited-size restoration, failing to meet practical needs. To fill this gap, we present a full-page HDR dataset (FPHDR) and a novel automated HDR solution (AutoHDR). Specifically, FPHDR comprises 1,633 real and 6,543 synthetic images with character-level and line-level locations, as well as character annotations in different damage grades. AutoHDR mimics historians' restoration workflows through a three-stage approach: OCR-assisted damage localization, vision-language context text prediction, and patch autoregressive appearance restoration. The modular architecture of AutoHDR enables seamless human-machine collaboration, allowing for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yeungchenwa/hdr
pytorch

Videos

Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration· underline