TL;DR
This paper introduces an interpretable deep learning approach using Large Vision-Language Models for deciphering Oracle Bone Script, combining radical and pictographic analysis to improve accuracy and interpretability, especially in zero-shot scenarios.
Contribution
It presents a novel, interpretable decipherment method that integrates radical and pictographic analysis with a progressive training strategy and a new dataset, advancing zero-shot and generalization capabilities.
Findings
Achieves state-of-the-art Top-10 accuracy on benchmarks.
Enhances zero-shot decipherment performance.
Provides logical analysis processes for archaeological insights.
Abstract
As the oldest mature writing system, Oracle Bone Script (OBS) has long posed significant challenges for archaeological decipherment due to its rarity, abstractness, and pictographic diversity. Current deep learning-based methods have made exciting progress on the OBS decipherment task, but existing approaches often ignore the intricate connections between glyphs and the semantics of OBS. This results in limited generalization and interpretability, especially when addressing zero-shot settings and undeciphered OBS. To this end, we propose an interpretable OBS decipherment method based on Large Vision-Language Models, which synergistically combines radical analysis and pictograph-semantic understanding to bridge the gap between glyphs and meanings of OBS. Specifically, we propose a progressive training strategy that guides the model from radical recognition and analysis to pictographic…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The progressive training design proposed in the paper aligns with the cognitive logic of oracle bone script, following a training sequence from radicals to pictograms and then to interactions. This sequence is consistent with the evolutionary rule of oracle bone script, which progresses from shape construction to semantic expression, thus avoiding semantic confusion caused by directly analyzing complete characters. 2. The constructed PD-OBS dataset includes detailed radical and pictogram ana
1. The paper points out that supervised fine-tuning restricts the model's generalization and inference capabilities to some extent. However, in the task of oracle bone script decipherment, there are many characters with similar shapes or meanings. In zero-shot testing scenarios, the model might rely on information from similar characters in the training labels, rather than performing a complete analysis of radicals and pictograms. 2. The method proposed in the paper is primarily based on large
1) An Interpretable Pipeline Based on OBS Structure: The paper introduces a well-structured and interpretable pipeline for decipherment. The methodology is built upon a progressive, three-stage analysis that begins with radical analysis, proceeds to pictographic analysis, and culminates in a mutual analysis stage. This approach is explicitly defined and logically motivated, improving the semantic alignment between ancient glyphs and their modern meanings. The proposed methodology is straightforw
1) Limitations in Performance and Incomprehensive Comparisons: (1.1) The reported performance, while notable in certain metrics, reveals some limitations. For instance, in the validation setting on the HUST-OBC and EVOBC dataset, the model's Top-1 accuracy is below that of the PyGT baseline. And in the zero-shot setting on the HUST-OBC dataset, the model's Top-1 accuracy is slightly below that of the OBSD baseline, with its primary advantage appearing in the Top-10 results (Table 1). This sugges
1. The use of LVLMs, combined with radical and pictograph analysis, provides excellent interpretability for deciphering Oracle Bone Script. 2. A progressive training strategy is proposed, gradually transitioning from the radical features of Oracle Bone Script to pictograph analysis, ultimately obtaining a more comprehensive understanding of glyph semantics. 3. The PD-OBS dataset is introduced, containing detailed pictograph analysis annotations for 3,173 types of Oracle Bone Script and 47,157 ty
1. The paper claims to be the first method to explain the deciphering process through pictograph analysis. However, in reality, both OracleSage and OracleFusion have used Oracle Bone Script components and pictograph information for deciphering. I believe the main contribution of this paper is the construction of a new pipeline that integrates radical and pictograph information for deciphering, rather than being the first to propose it. The term "first" may be misleading and could be off-putting.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
