TL;DR
This paper introduces a new benchmark and a glyph-driven fine-tuning method to improve multimodal large language models' ability to analyze ancient Chinese script evolution.
Contribution
It constructs a comprehensive benchmark for ancient Chinese script evolution analysis and proposes a glyph-driven fine-tuning framework to enhance model performance.
Findings
Existing models show limited glyph-level comparison ability.
Performance on core tasks remains constrained without specialized fine-tuning.
Fine-tuning with GEVO improves performance across all evaluated tasks.
Abstract
In recent years, rapid advances in Multimodal Large Language Models (MLLMs) have increasingly stimulated research on ancient Chinese scripts. As the evolution of written characters constitutes a fundamental pathway for understanding cultural transformation and historical continuity, how MLLMs can be systematically leveraged to support and advance text evolution analysis remains an open and largely underexplored problem. To bridge this gap, we construct a comprehensive benchmark comprising 11 tasks and over 130,000 instances, specifically designed to evaluate the capability of MLLMs in analyzing the evolution of ancient Chinese scripts. We conduct extensive evaluations across multiple widely used MLLMs and observe that, while existing models demonstrate a limited ability in glyph-level comparison, their performance on core tasks-such as character recognition and evolutionary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
