Structuring Authenticity Assessments on Historical Documents using LLMs
Andrea Schimmenti, Valentina Pasqual, Francesca Tomasi, Fabio Vitali, and Marieke van Erp

TL;DR
This paper presents a method using Large Language Models and Semantic Web technologies to extract, classify, and structure scholarly debates on the authenticity of historical documents, enabling large-scale analysis.
Contribution
It introduces a novel pipeline that automatically generates structured data on document authenticity assessments from natural language texts without requiring training.
Findings
Creates a catalogue of debated documents with scholarly opinions
Enables complex queries and analysis of authenticity debates over centuries
Demonstrates effective extraction and classification of authenticity claims
Abstract
Given the wide use of forgery throughout history, scholars have and are continuously engaged in assessing the authenticity of historical documents. However, online catalogues merely offer descriptive metadata for these documents, relegating discussions about their authenticity to free-text formats, making it difficult to study these assessments at scale. This study explores the generation of structured data about documents' authenticity assessment from natural language texts. Our pipeline exploits Large Language Models (LLMs) to select, extract and classify relevant claims about the topic without the need for training, and Semantic Web technologies to structure and type-validate the LLM's results. The final output is a catalogue of documents whose authenticity has been debated, along with scholars' opinions on their authenticity. This process can serve as a valuable resource for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Traditional Archives Management
