Automated Invoice Data Extraction: Using LLM and OCR

Khushi Khanchandani; Advait Thakur; Akshita Shetty; Chaitravi Reddy; Ritisa Behera

arXiv:2511.05547·cs.CV·January 9, 2026

Automated Invoice Data Extraction: Using LLM and OCR

Khushi Khanchandani, Advait Thakur, Akshita Shetty, Chaitravi Reddy, Ritisa Behera

PDF

Open Access

TL;DR

This paper presents a comprehensive AI platform that combines OCR, deep learning, LLMs, and graph analytics to significantly improve invoice data extraction accuracy and flexibility across diverse document layouts.

Contribution

It introduces an integrated AI system leveraging OCR, LLMs, and graph analytics, advancing invoice data extraction beyond existing hybrid approaches.

Findings

01

Higher extraction accuracy than traditional OCR methods

02

Enhanced contextual understanding with LLMs

03

Robust performance across varied invoice formats

Abstract

Conventional Optical Character Recognition (OCR) systems are challenged by variant invoice layouts, handwritten text, and low-quality scans, which are often caused by strong template dependencies that restrict their flexibility across different document structures and layouts. Newer solutions utilize advanced deep learning models such as Convolutional Neural Networks (CNN) as well as Transformers, and domain-specific models for better layout analysis and accuracy across various sections over varied document types. Large Language Models (LLMs) have revolutionized extraction pipelines at their core with sophisticated entity recognition and semantic comprehension to support complex contextual relationship mapping without direct programming specification. Visual Named Entity Recognition (NER) capabilities permit extraction from invoice images with greater contextual sensitivity and much…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Text and Document Classification Technologies