DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop
Neelesh K Shukla, Msp Raja, Raghu Katikeri, Amit Vaid

TL;DR
This paper introduces DoSA, a system that accelerates annotation of business documents by combining automated initial annotations with human review, enabling iterative improvement of document-specific models.
Contribution
The paper presents a novel bootstrap approach for automated annotations in business documents, leveraging generic datasets and models to reduce manual effort.
Findings
Automated annotations significantly reduce manual labeling time.
Iterative human-in-the-loop improves model accuracy over time.
System is applicable to form-like business documents.
Abstract
Business documents come in a variety of structures, formats and information needs which makes information extraction a challenging task. Due to these variations, having a document generic model which can work well across all types of documents and for all the use cases seems far-fetched. For document-specific models, we would need customized document-specific labels. We introduce DoSA (Document Specific Automated Annotations), which helps annotators in generating initial annotations automatically using our novel bootstrap approach by leveraging document generic datasets and models. These initial annotations can further be reviewed by a human for correctness. An initial document-specific model can be trained and its inference can be used as feedback for generating more automated annotations. These automated annotations can be reviewed by human-in-the-loop for the correctness and a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Semantic Web and Ontologies · Scientific Computing and Data Management
