LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP

Danlu Chen; Freda Shi; Aditi Agarwal; Jacobo Myerston; Taylor Berg-Kirkpatrick

arXiv:2408.04628·cs.CL·January 29, 2026

LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP

Danlu Chen, Freda Shi, Aditi Agarwal, Jacobo Myerston, Taylor Berg-Kirkpatrick

PDF

Open Access 1 Video

TL;DR

This paper introduces LogogramNLP, a benchmark for analyzing ancient logographic languages using visual and textual data, showing visual methods can outperform text-based ones for certain NLP tasks.

Contribution

It presents the first benchmark for NLP on ancient logographic languages, comparing visual and textual representations and demonstrating the potential of visual processing.

Findings

01

Visual representations outperform textual ones on some tasks

02

The benchmark includes datasets for classification, translation, and parsing

03

Visual processing can unlock cultural heritage data for NLP

Abstract

Standard natural language processing (NLP) pipelines operate on symbolic representations of language, which typically consist of sequences of discrete tokens. However, creating an analogous representation for ancient logographic writing systems is an extremely labor intensive process that requires expert knowledge. At present, a large portion of logographic data persists in a purely visual form due to the absence of transcription -- this issue poses a bottleneck for researchers seeking to apply NLP toolkits to study ancient logographic languages: most of the relevant data are images of writing. This paper investigates whether direct processing of visual representations of language offers a potential solution. We introduce LogogramNLP, the first benchmark enabling NLP analysis of ancient logographic languages, featuring both transcribed and visual datasets for four writing systems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LogogramNLP: Comparing Visual and Textual Representations of Ancient Logographic Writing Systems for NLP· underline

Taxonomy

TopicsNatural Language Processing Techniques · Language, Metaphor, and Cognition