Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image   Classification Using Large Language Models

Anna Scius-Bertrand; Michael Jungo; Lars V\"ogtlin; Jean-Marc Spat and; Andreas Fischer

arXiv:2412.13859·cs.CV·December 19, 2024

Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models

Anna Scius-Bertrand, Michael Jungo, Lars V\"ogtlin, Jean-Marc Spat and, Andreas Fischer

PDF

TL;DR

This paper explores the potential of large language models to classify documents effectively with minimal or no training data, challenging traditional reliance on extensive labeled datasets.

Contribution

It systematically investigates zero-shot prompting and few-shot fine-tuning of LLMs for document classification, offering insights into reducing annotation efforts.

Findings

01

LLMs can achieve competitive accuracy with zero-shot prompting.

02

Few-shot fine-tuning improves performance significantly.

03

Potential to reduce reliance on large annotated datasets.

Abstract

Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding. Nevertheless, for certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to near-perfect performance when considering hundreds of thousands of training samples. With the advent of large language models (LLMs), which are excellent few-shot learners, the question arises to what extent the document classification problem can be addressed with only a few training samples, or even none at all. In this paper, we investigate this question in the context of zero-shot prompting and few-shot model fine-tuning, with the aim of reducing the need for human-annotated training samples as much as possible.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.