Reasoning Over the Glyphs: Evaluation of LLM's Decipherment of Rare   Scripts

Yu-Fei Shih; Zheng-Lin Lin; Shu-Kai Hsieh

arXiv:2501.17785·cs.CL·January 30, 2025

Reasoning Over the Glyphs: Evaluation of LLM's Decipherment of Rare Scripts

Yu-Fei Shih, Zheng-Lin Lin, Shu-Kai Hsieh

PDF

Open Access

TL;DR

This paper evaluates how well large language and multimodal models can decipher rare scripts not included in Unicode, introducing new datasets and methods to assess their capabilities and limitations.

Contribution

It presents a novel multimodal dataset and methods for evaluating LLMs and LVLMs on deciphering rare scripts, highlighting current strengths and challenges.

Findings

01

Models show limited success in deciphering scripts without Unicode encoding.

02

Unicode encoding significantly impacts model performance.

03

Visual language token modeling remains a key challenge.

Abstract

We explore the capabilities of LVLMs and LLMs in deciphering rare scripts not encoded in Unicode. We introduce a novel approach to construct a multimodal dataset of linguistic puzzles involving such scripts, utilizing a tokenization method for language glyphs. Our methods include the Picture Method for LVLMs and the Description Method for LLMs, enabling these models to tackle these challenges. We conduct experiments using prominent models, GPT-4o, Gemini, and Claude 3.5 Sonnet, on linguistic puzzles. Our findings reveal the strengths and limitations of current AI methods in linguistic decipherment, highlighting the impact of Unicode encoding on model performance and the challenges of modeling visual language tokens through descriptions. Our study advances understanding of AI's potential in linguistic decipherment and underscores the need for further research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations