ProtoSnap: Prototype Alignment for Cuneiform Signs
Rachel Mikulinsky, Morris Alper, Shai Gordin, Enrique Jim\'enez, Yoram, Cohen, Hadar Averbuch-Elor

TL;DR
ProtoSnap is an unsupervised method that aligns prototype skeletons to complex cuneiform signs, improving recognition accuracy and enabling synthetic data generation for rare signs by modeling their internal structure.
Contribution
It introduces an unsupervised approach leveraging generative models and prototype fonts to recover detailed internal configurations of cuneiform signs, enhancing recognition and data synthesis.
Findings
Successfully aligns prototype skeletons to diverse cuneiform signs
Improves recognition accuracy, especially for rare signs
Enables generation of structurally correct synthetic data
Abstract
The cuneiform writing system served as the medium for transmitting knowledge in the ancient Near East for a period of over three thousand years. Cuneiform signs have a complex internal structure which is the subject of expert paleographic analysis, as variations in sign shapes bear witness to historical developments and transmission of writing and culture over time. However, prior automated techniques mostly treat sign types as categorical and do not explicitly model their highly varied internal configurations. In this work, we present an unsupervised approach for recovering the fine-grained internal configuration of cuneiform signs by leveraging powerful generative models and the appearance and structure of prototype font images as priors. Our approach, ProtoSnap, enforces structural consistency on matches found with deep image features to estimate the diverse configurations of…
Peer Reviews
Decision·ICLR 2025 Poster
Originality: ProtoSnap's use of deep diffusion features and skeleton - based prototypes for unsupervised cuneiform sign alignment is novel. Quality: The overall flow of the methodology section is sound and logical. Clarity: This paper is clearer on the whole, from the introduction part of the cuneiform research background and the limitations of the existing methods, which naturally leads to the research objective, i.e., to propose the ProtoSnap method to solve the problem of analysing the inte
1. While this paper presents a new benchmark for evaluation, the current dataset may not be fully representative of the variety of cuneiform symbol variants and writing conditions present in the historical record. 2. The superiority of the method proposed in this paper is not reflected in the related work. 3. 4D similarity volumes in section 4.1 are not clearly described. 4. While the method shows promise for cuneiform signs, its adaptation to other ancient writing systems or complex symbol sets
+ The application of unsupervised learning and prototype alignment to cuneiform signs is novel and shows significant potential. + The technical approach is sound, utilizing SoTA techniques in image processing and machine learning. + The method has clear applications in digital humanities, aiding the decipherment and study of ancient texts.
1. There’s a potential risk that the method could overfit to the prototypes it has been trained on, especially if those prototypes do not capture the full variability of the signs in the dataset. 2. It would be good if the authors could report on the computational resources required for implementing the ProtoSnap method. Considering that it involves deep learning models and generative processes for aligning prototypes with actual images, understanding the computational demands is crucial. 3. I
1. The authors have done a good job presenting their method to a reader unfamiliar with the subject. The paper is well written and the ideas well presented. 2. The method is novel for cuneiform sign alignment, as it adopts a common tactic from pose/keypoint detection problems in the scope of the presented subject. 3. The method achieves sota results in cuneiform sign alignment, although a more detailed comparisons scheme could have been designed (more details in weaknesses 1). 4. The method ach
1. Comparisons in Table 1 are not clear. To my understanding, for DINOv2 and DIFT the authors directly decide keypoints based on feature similarity without solving RANSAC. On the other hand, the authors employ RANSAC for SIFT features and their method (with or without refinement). In my view, the authors should not focus on a single model for feature extraction, but rather experiment with all of them in the same setting (with or without RANSAC) and present their method as a more general method f
Videos
Taxonomy
TopicsClassical Antiquity Studies · Archaeological Research and Protection · Ancient Mediterranean Archaeology and History
