Data Efficient Training of a U-Net Based Architecture for Structured Documents Localization
Anastasiia Kabeshova, Guillaume Betmont, Julien Lerouge, Evgeny, Stepankevich, Alexis Berg\`es

TL;DR
This paper introduces SDL-Net, a U-Net based architecture for structured document localization that is pre-trainable on generic datasets, enabling efficient fine-tuning for new document classes with limited labeled data.
Contribution
The paper presents SDL-Net, a novel, data-efficient U-Net like architecture that improves document localization by leveraging pre-training and fast fine-tuning on new document classes.
Findings
SDL-Net achieves high localization accuracy on proprietary datasets.
Pre-training on generic datasets enhances generalization to new document types.
Fine-tuning requires less data and computational resources.
Abstract
Structured documents analysis and recognition are essential for modern online on-boarding processes, and document localization is a crucial step to achieve reliable key information extraction. While deep-learning has become the standard technique used to solve document analysis problems, real-world applications in industry still face the limited availability of labelled data and of computational resources when training or fine-tuning deep-learning models. To tackle these challenges, we propose SDL-Net: a novel U-Net like encoder-decoder architecture for the localization of structured documents. Our approach allows pre-training the encoder of SDL-Net on a generic dataset containing samples of various document classes, and enables fast and data-efficient fine-tuning of decoders to support the localization of new document classes. We conduct extensive experiments on a proprietary dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Advanced Neural Network Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Max Pooling · U-Net
