End-to-end Document Recognition and Understanding with Dessurt

Brian Davis; Bryan Morse; Bryan Price; Chris Tensmeyer; Curtis; Wigington; and Vlad Morariu

arXiv:2203.16618·cs.CV·June 17, 2022·1 cites

End-to-end Document Recognition and Understanding with Dessurt

Brian Davis, Bryan Morse, Bryan Price, Chris Tensmeyer, Curtis, Wigington, and Vlad Morariu

PDF

Open Access 2 Repos

TL;DR

Dessurt is a versatile end-to-end transformer model for document understanding that integrates text recognition and understanding, enabling fine-tuning across diverse document tasks without external recognition models.

Contribution

It introduces Dessurt, a simple, flexible transformer architecture capable of handling multiple document understanding tasks in an end-to-end manner.

Findings

01

Effective on 9 dataset-task combinations

02

Does not require external recognition models

03

Handles diverse document domains and tasks

Abstract

We introduce Dessurt, a relatively simple document understanding transformer capable of being fine-tuned on a greater variety of document tasks than prior methods. It receives a document image and task string as input and generates arbitrary text autoregressively as output. Because Dessurt is an end-to-end architecture that performs text recognition in addition to the document understanding, it does not require an external recognition model as prior methods do. Dessurt is a more flexible model than prior methods and is able to handle a variety of document domains and tasks. We show that this model is effective at 9 different dataset-task combinations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling