Information Extraction from Electricity Invoices with General-Purpose Large Language Models
Javier G\'omez, Javier S\'anchez

TL;DR
This paper evaluates the effectiveness of general-purpose large language models in extracting structured data from Spanish electricity invoices, emphasizing prompt engineering over hyperparameter tuning.
Contribution
It demonstrates that prompt design significantly impacts extraction accuracy, achieving over 97% F1-score with minimal hyperparameter tuning on invoice data.
Findings
Prompt quality has a greater effect on performance than hyperparameter tuning.
Few-shot prompting with cross-validation yields the highest F1-scores (~97%).
Document structure influences extraction difficulty.
Abstract
Information extraction from semi-structured business documents remains a critical challenge for enterprise management. This study evaluates the capability of general-purpose Large Language Models to extract structured information from Spanish electricity invoices without task-specific fine-tuning. Using a subset of the IDSEM dataset, we benchmark two architecturally distinct models, Gemini 1.5 Pro and Mistral-small, across 19 parameter configurations and 6 prompting strategies. Our experimental framework treats prompt engineering as the primary experimental variable, comparing zero-shot baselines against increasingly sophisticated few-shot approaches and iterative extraction strategies. Results demonstrate that prompt quality dominates over hyperparameter tuning: the F1-score variation across all parameter configurations is marginal, while the gap between zero-shot and the best few-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
