Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with B\'ezier Curves
Zihao Wan, Pau Tong Lin Xu, Fuwen Luo, Ziyue Wang, Peng Li, Yang Liu

TL;DR
This paper introduces a model that decompiles pictographic characters into geometric programs, demonstrating superior zero-shot reconstruction of ancient scripts, indicating an understanding of geometric structure beyond pixel patterns.
Contribution
The paper presents a novel approach to interpret pictographic characters as geometric programs, enabling zero-shot reconstruction of ancient scripts and revealing transferable geometric understanding.
Findings
Model outperforms zero-shot baselines including GPT-4o.
Successfully reconstructs Oracle Bone Script from modern Chinese characters.
Demonstrates transfer of geometric understanding across scripts.
Abstract
While Vision-language Models (VLMs) have demonstrated strong semantic capabilities, their ability to interpret the underlying geometric structure of visual information is less explored. Pictographic characters, which combine visual form with symbolic structure, provide an ideal test case for this capability. We formulate this visual recognition challenge in the mathematical domain, where each character is represented by an executable program of geometric primitives. This is framed as a program synthesis task, training a VLM to decompile raster images into programs composed of B\'ezier curves. Our model, acting as a "visual decompiler", demonstrates performance superior to strong zero-shot baselines, including GPT-4o. The most significant finding is that when trained solely on modern Chinese characters, the model is able to reconstruct ancient Oracle Bone Script in a zero-shot context.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
