Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models
Shraboni Sarker, Ahmad Tamim Hamad, Hulayyil Alshammari, Viviana, Grieco, Praveen Rao

TL;DR
This paper introduces a historical Spanish notary record dataset from the 17th century, demonstrating its effectiveness in fine-tuning language models for various NLP tasks and outperforming existing models.
Contribution
The creation of a unique 17th-century Spanish notary record dataset for fine-tuning language models, enabling improved historical text analysis and NLP performance.
Findings
The dataset improves fine-tuning results for Spanish LLMs.
Fine-tuned models outperform pre-trained Spanish models.
The resource is publicly available for research.
Abstract
Large language models have gained tremendous popularity in domains such as e-commerce, finance, healthcare, and education. Fine-tuning is a common approach to customize an LLM on a domain-specific dataset for a desired downstream task. In this paper, we present a valuable resource for fine-tuning LLMs developed for the Spanish language to perform a variety of tasks such as classification, masked language modeling, clustering, and others. Our resource is a collection of handwritten notary records from the seventeenth century obtained from the National Archives of Argentina. This collection contains a combination of original images and transcribed text (and metadata) of 160+ pages that were handwritten by two notaries, namely, Estenban Agreda de Vergara and Nicolas de Valdivia y Brisuela nearly 400 years ago. Through empirical evaluation, we demonstrate that our collection can be used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHistorical Studies in Science · Historical Linguistics and Language Studies · Spanish Linguistics and Language Studies
