# Language and Dialect Identification of Cuneiform Texts

**Authors:** Tommi Jauhiainen, Heidi Jauhiainen, Tero Alstola, Krister Lind\'en

arXiv: 1903.01891 · 2019-03-14

## TL;DR

This paper presents a new dataset and initial experiments for automatic language identification of cuneiform texts, marking the first application of such methods to this ancient script.

## Contribution

It introduces the CLI dataset derived from cuneiform texts and provides baseline language identification results, pioneering automatic analysis in this field.

## Key findings

- First use of automatic language identification on cuneiform texts
- Baseline results established for future research
- Dataset and methodology provided for further studies

## Abstract

This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus. We also describe the CLI dataset and how it was derived from the corpus. In addition, we provide some baseline language identification results using the CLI dataset. To the best of our knowledge, the experiments detailed here are the first time automatic language identification methods have been used on cuneiform data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.01891/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1903.01891/full.md

---
Source: https://tomesphere.com/paper/1903.01891