DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinates-based Diffusion Model

Weiguang Zhang; Huangcheng Lu; Maizhen Ning; Xiaowei Huang; Wei Wang; Kaizhu Huang; Qiufeng Wang

arXiv:2505.21975·cs.CV·October 10, 2025

DvD: Unleashing a Generative Paradigm for Document Dewarping via Coordinates-based Diffusion Model

Weiguang Zhang, Huangcheng Lu, Maizhen Ning, Xiaowei Huang, Wei Wang, Kaizhu Huang, Qiufeng Wang

PDF

Open Access

TL;DR

This paper introduces DvD, a novel generative diffusion model for document dewarping that uses coordinate-level denoising and a time-variant refinement mechanism, achieving state-of-the-art results and providing a new large-scale benchmark.

Contribution

It presents the first diffusion-based generative model for document dewarping with coordinate-level denoising and a new comprehensive benchmark dataset.

Findings

01

DvD achieves state-of-the-art performance on multiple benchmarks.

02

Coordinate-level denoising effectively preserves document structures.

03

The new benchmark enables more thorough evaluation of dewarping models.

Abstract

Document dewarping aims to rectify deformations in photographic document images, thus improving text readability, which has attracted much attention and made great progress, but it is still challenging to preserve document structures. Given recent advances in diffusion models, it is natural for us to consider their potential applicability to document dewarping. However, it is far from straightforward to adopt diffusion models in document dewarping due to their unfaithful control on highly complex document images (e.g., 2000 $t im es$ 3000 resolution). In this paper, we propose DvD, the first generative model to tackle document Dewarping via a Diffusion framework. To be specific, DvD introduces a coordinate-level denoising instead of typical pixel-level denoising, generating a mapping for deformation rectification. In addition, we further propose a time-variant condition refinement mechanism…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques