BIRD: Bronze Inscription Restoration and Dating
Wenjie Hua, Hoang H. Nguyen, Gangyan Ge

TL;DR
This paper presents BIRD, a new dataset and a specialized language model that improves the restoration and dating of fragmentary Bronze inscriptions from early China by integrating domain knowledge and glyph information.
Contribution
The paper introduces BIRD, a comprehensive dataset and a novel allograph-aware language model with Glyph Net for better restoration and dating of bronze inscriptions.
Findings
Glyph Net enhances restoration accuracy
Glyph-biased sampling improves dating performance
Allograph-aware modeling benefits inscription analysis
Abstract
Bronze inscriptions from early China are fragmentary and difficult to date. We introduce BIRD(Bronze Inscription Restoration and Dating), a fully encoded dataset grounded in standard scholarly transcriptions and chronological labels. We further propose an allograph-aware masked language modeling framework that integrates domain- and task-adaptive pretraining with a Glyph Net (GN), which links graphemes and allographs. Experiments show that GN improves restoration, while glyph-biased sampling yields gains in dating.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Processing and 3D Reconstruction · Language and cultural evolution · Big Data and Digital Economy
