BookNet: Book Image Rectification via Cross-Page Attention Network

Shaokai Liu; Hao Feng; Bozhi Luan; Min Hou; Jiajun Deng; Wengang Zhou

arXiv:2601.21938·cs.CV·January 30, 2026

BookNet: Book Image Rectification via Cross-Page Attention Network

Shaokai Liu, Hao Feng, Bozhi Luan, Min Hou, Jiajun Deng, Wengang Zhou

PDF

Open Access

TL;DR

BookNet is a novel deep learning framework that effectively rectifies complex curved book images by modeling the geometric relationship between adjacent pages using cross-page attention, supported by new synthetic and real datasets.

Contribution

It introduces the first end-to-end dual-page rectification network with cross-page attention and provides large-scale synthetic and real datasets for training and evaluation.

Findings

01

Outperforms existing methods on rectification accuracy

02

Effectively models inter-page geometric relationships

03

Demonstrates robustness on real-world book images

Abstract

Book image rectification presents unique challenges in document image processing due to complex geometric distortions from binding constraints, where left and right pages exhibit distinctly asymmetric curvature patterns. However, existing single-page document image rectification methods fail to capture the coupled geometric relationships between adjacent pages in books. In this work, we introduce BookNet, the first end-to-end deep learning framework specifically designed for dual-page book image rectification. BookNet adopts a dual-branch architecture with cross-page attention mechanisms, enabling it to estimate warping flows for both individual pages and the complete book spread, explicitly modeling how left and right pages influence each other. Moreover, to address the absence of specialized datasets, we present Book3D, a large-scale synthetic dataset for training, and Book100, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques