TL;DR
This paper introduces a CNN-based model with attention for writer identification in historical Arabic manuscripts, providing new benchmarks and expanding dataset labels to improve accuracy and generalization assessment.
Contribution
It offers the first combined line- and page-disjoint evaluation protocols, expands dataset labels, and benchmarks multiple configurations for writer identification in historical Arabic texts.
Findings
High accuracy (99.05%) on line-level identification.
Significant drop in accuracy (78.61%) under page-disjoint protocol.
Expanded dataset labels from 6,858 to 21,249 lines.
Abstract
Handwritten Arabic manuscripts preserve the Arab world's intellectual and cultural heritage, and writer identification supports provenance, authenticity verification, and historical analysis. Using the Muharaf dataset of historical Arabic manuscripts, we evaluate writer identification from individual line images and, to the best of our knowledge, provide the first baselines reported under both line-level and page-disjoint evaluation protocols. Since the dataset is only partially labeled for writer identification, we manually verified and expanded writer labels in the public portion from 6,858 (28.00%) to 21,249 lines (86.75%) out of 24,495 line images, correcting inconsistencies and removing non-handwritten text. After further filtering, we retained 18,987 lines (77.51%). We propose a Convolutional Neural Network (CNN)-based model with attention mechanisms for closed-set writer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
