The future of document indexing: GPT and Donut revolutionize table of content processing
Degaga Wolde Feyisa, Haylemicheal Berihun, Amanuel Zewdu, Mahsa, Najimoghadam, Marzieh Zare

TL;DR
This paper presents a novel AI-driven approach combining Donut and GPT-3.5 Turbo to automate extraction and structuring of table of contents from complex documents, significantly improving efficiency in document indexing.
Contribution
It introduces a new methodology that automates ToC extraction from scanned documents using AI models without OCR, achieving high accuracy and advancing document indexing technology.
Findings
Donut achieves 85% accuracy in extracting ToCs.
GPT-3.5 Turbo reaches 89% accuracy in structuring ToCs.
The approach significantly reduces manual effort in document processing.
Abstract
Industrial projects rely heavily on lengthy, complex specification documents, making tedious manual extraction of structured information a major bottleneck. This paper introduces an innovative approach to automate this process, leveraging the capabilities of two cutting-edge AI models: Donut, a model that extracts information directly from scanned documents without OCR, and OpenAI GPT-3.5 Turbo, a robust large language model. The proposed methodology is initiated by acquiring the table of contents (ToCs) from construction specification documents and subsequently structuring the ToCs text into JSON data. Remarkable accuracy is achieved, with Donut reaching 85% and GPT-3.5 Turbo reaching 89% in effectively organizing the ToCs. This landmark achievement represents a significant leap forward in document indexing, demonstrating the immense potential of AI to automate information extraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics, Computing, and Information Processing
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Residual Connection · Weight Decay · Linear Layer · Dense Connections · Adam · Dropout
