Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models
Anna Scius-Bertrand, Michael Jungo, Lars V\"ogtlin, Jean-Marc Spat and, Andreas Fischer

TL;DR
This paper explores the potential of large language models to classify documents effectively with minimal or no training data, challenging traditional reliance on extensive labeled datasets.
Contribution
It systematically investigates zero-shot prompting and few-shot fine-tuning of LLMs for document classification, offering insights into reducing annotation efforts.
Findings
LLMs can achieve competitive accuracy with zero-shot prompting.
Few-shot fine-tuning improves performance significantly.
Potential to reduce reliance on large annotated datasets.
Abstract
Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding. Nevertheless, for certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to near-perfect performance when considering hundreds of thousands of training samples. With the advent of large language models (LLMs), which are excellent few-shot learners, the question arises to what extent the document classification problem can be addressed with only a few training samples, or even none at all. In this paper, we investigate this question in the context of zero-shot prompting and few-shot model fine-tuning, with the aim of reducing the need for human-annotated training samples as much as possible.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
