TL;DR
HieroSA is a novel framework that enables multimodal models to analyze hieroglyphic characters at the stroke level without language-specific prior knowledge.
Contribution
It introduces a generalizable method to extract stroke-level structures from hieroglyphs directly from images, enhancing structural understanding across scripts.
Findings
HieroSA effectively captures internal character structures and semantics.
The method generalizes across modern and ancient hieroglyphs.
Experimental results show improved structural analysis without handcrafted data.
Abstract
Hieroglyphs, as logographic writing systems, encode rich semantic and cultural information within their internal structural composition. Yet, current advanced Large Language Models (LLMs) and Multimodal LLMs (MLLMs) usually remain structurally blind to this information. LLMs process characters as textual tokens, while MLLMs additionally view them as raw pixel grids. Both fall short to model the underlying logic of character strokes. Furthermore, existing structural analysis methods are often script-specific and labor-intensive. In this paper, we propose Hieroglyphic Stroke Analyzer (HieroSA), a novel and generalizable framework that enables MLLMs to automatically derive stroke-level structures from character bitmaps without handcrafted data. It transforms modern logographic and ancient hieroglyphs character images into explicit, interpretable line-segment representations in a normalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
