DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis
Lars V\"ogtlin, Anna Scius-Bertrand, Paul Maergner, Andreas Fischer,, Rolf Ingold

TL;DR
DIVA-DAF is an open-source deep learning framework based on PyTorch Lightning, designed to simplify and accelerate historical document image analysis tasks like segmentation and classification.
Contribution
It introduces a customizable, easy-to-use framework with pre-implemented tasks and modules that significantly reduce development and training time for historical document analysis.
Findings
Time savings in programming document analysis tasks
Reduced model training time due to efficient data modules
Flexibility in customizing tasks and architectures
Abstract
Deep learning methods have shown strong performance in solving tasks for historical document image analysis. However, despite current libraries and frameworks, programming an experiment or a set of experiments and executing them can be time-consuming. This is why we propose an open-source deep learning framework, DIVA-DAF, which is based on PyTorch Lightning and specifically designed for historical document analysis. Pre-implemented tasks such as segmentation and classification can be easily used or customized. It is also easy to create one's own tasks with the benefit of powerful modules for loading data, even large data sets, and different forms of ground truth. The applications conducted have demonstrated time savings for the programming of a document analysis task, as well as for different scenarios such as pre-training or changing the architecture. Thanks to its data module, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Digital Media Forensic Detection
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Max Pooling · Concatenated Skip Connection · U-Net
