Curation of a Palaeohispanic Dataset for Machine Learning
Gonzalo Mart\'inez-Fern\'andez, Jose F Quesada, Agust\'in Riscos-N\'u\~nez, Francisco Jos\'e Salguero-Lamillar

TL;DR
This paper presents a structured Palaeohispanic language dataset to facilitate machine learning research, addressing the scarcity and format issues of existing resources.
Contribution
The creation of a curated, machine-learning-ready dataset for Palaeohispanic languages, enabling computational analysis in a field with limited resources.
Findings
Dataset enables new computational analyses of Palaeohispanic languages
Structured data improves accessibility for machine learning applications
Supports further linguistic and archaeological research
Abstract
Palaeohispanic languages are those spoken in the Iberian Peninsula before the arrival of the Romans in the 3rd Century B.C. Their study was really put on motion after G\'omez Moreno deciphered the Iberian Levantine script, one of the several semi-sillabaries used by these languages. Still, the Palaeohispanic languages have varying degrees of decipherment, and none is fully known to this day. Most of the studies have been performed from a purely linguistic point of view, and a computational approach may benefit this research area greatly. However, the resources are limited and presented in an unsuitable format for techniques such as Machine Learning. Therefore, a structured dataset is constructed, which will hopefully allow more progress in the field.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
