Structuring Authenticity Assessments on Historical Documents using LLMs

Andrea Schimmenti; Valentina Pasqual; Francesca Tomasi; Fabio Vitali; and Marieke van Erp

arXiv:2407.09290·cs.DL·July 15, 2024·1 cites

Structuring Authenticity Assessments on Historical Documents using LLMs

Andrea Schimmenti, Valentina Pasqual, Francesca Tomasi, Fabio Vitali, and Marieke van Erp

PDF

Open Access

TL;DR

This paper presents a method using Large Language Models and Semantic Web technologies to extract, classify, and structure scholarly debates on the authenticity of historical documents, enabling large-scale analysis.

Contribution

It introduces a novel pipeline that automatically generates structured data on document authenticity assessments from natural language texts without requiring training.

Findings

01

Creates a catalogue of debated documents with scholarly opinions

02

Enables complex queries and analysis of authenticity debates over centuries

03

Demonstrates effective extraction and classification of authenticity claims

Abstract

Given the wide use of forgery throughout history, scholars have and are continuously engaged in assessing the authenticity of historical documents. However, online catalogues merely offer descriptive metadata for these documents, relegating discussions about their authenticity to free-text formats, making it difficult to study these assessments at scale. This study explores the generation of structured data about documents' authenticity assessment from natural language texts. Our pipeline exploits Large Language Models (LLMs) to select, extract and classify relevant claims about the topic without the need for training, and Semantic Web technologies to structure and type-validate the LLM's results. The final output is a catalogue of documents whose authenticity has been debated, along with scholars' opinions on their authenticity. This process can serve as a valuable resource for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Traditional Archives Management