Oracle Bone Inscriptions Multi-modal Dataset
Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng,, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun, Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge, Liu, Yunsheng Wu

TL;DR
This paper introduces a comprehensive multi-modal dataset for Oracle Bone Inscriptions, enabling advanced AI research to assist in deciphering and analyzing this ancient Chinese writing system.
Contribution
The paper presents the first high-quality, multi-modal annotated dataset for OBI, supporting various AI tasks and advancing research in paleography and ancient script recognition.
Findings
Dataset includes 10,077 oracle bone images with detailed annotations.
Supports multiple AI tasks like detection, recognition, and sequence prediction.
Facilitates significant progress in AI-assisted decipherment of OBI.
Abstract
Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging the advantages of advanced AI technology to assist in the decipherment of OBI is a highly essential research topic. However, fully utilizing AI's capabilities in these matters is reliant on having a comprehensive and high-quality annotated OBI dataset at hand whereas most existing datasets are only annotated in just a single or a few dimensions, limiting the value of their potential application. For instance, the Oracle-MNIST dataset only offers 30k images classified into 10 categories.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Image Processing and 3D Reconstruction · Library Science and Information Systems
