Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation

Jianing Zhang; Runan Li; Honglin Pang; Ding Xia; Zhou Zhu; Qian Zhang; Chuntao Li; Xi Yang

arXiv:2604.06711·cs.CV·April 9, 2026

Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation

Jianing Zhang, Runan Li, Honglin Pang, Ding Xia, Zhou Zhu, Qian Zhang, Chuntao Li, Xi Yang

PDF

TL;DR

This paper presents a novel multimodal framework that combines vision and language models to interpret Oracle Bone Script by leveraging its component structure, significantly improving decipherment accuracy.

Contribution

It introduces an agent-driven VLM framework and OB-Radix dataset to enhance structural understanding and semantic interpretation of ancient Chinese characters.

Findings

01

Outperforms baseline methods on multiple benchmarks.

02

Provides detailed and accurate decipherments.

03

Introduces a new dataset with structural and semantic annotations.

Abstract

Deciphering ancient Chinese Oracle Bone Script (OBS) is a challenging task that offers insights into the beliefs, systems, and culture of the ancient era. Existing approaches treat decipherment as a closed-set image recognition problem, which fails to bridge the ``interpretation gap'': while individual characters are often unique and rare, they are composed of a limited set of recurring, pictographic components that carry transferable semantic meanings. To leverage this structural logic, we propose an agent-driven Vision-Language Model (VLM) framework that integrates a VLM for precise visual grounding with an LLM-based agent to automate a reasoning chain of component identification, graph-based knowledge retrieval, and relationship inference for linguistically accurate interpretation. To support this, we also introduce OB-Radix, an expert-annotated dataset providing structural and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.