Oracle Bone Inscriptions Multi-modal Dataset

Bang Li; Donghao Luo; Yujie Liang; Jing Yang; Zengmao Ding; Xu Peng,; Boyuan Jiang; Shengwei Han; Dan Sui; Peichao Qin; Pian Wu; Chaoyang Wang; Yun; Qi; Taisong Jin; Chengjie Wang; Xiaoming Huang; Zhan Shu; Rongrong Ji; Yongge; Liu; Yunsheng Wu

arXiv:2407.03900·cs.CV·July 8, 2024·3 cites

Oracle Bone Inscriptions Multi-modal Dataset

Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng,, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun, Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge, Liu, Yunsheng Wu

PDF

Open Access 2 Datasets

TL;DR

This paper introduces a comprehensive multi-modal dataset for Oracle Bone Inscriptions, enabling advanced AI research to assist in deciphering and analyzing this ancient Chinese writing system.

Contribution

The paper presents the first high-quality, multi-modal annotated dataset for OBI, supporting various AI tasks and advancing research in paleography and ancient script recognition.

Findings

01

Dataset includes 10,077 oracle bone images with detailed annotations.

02

Supports multiple AI tasks like detection, recognition, and sequence prediction.

03

Facilitates significant progress in AI-assisted decipherment of OBI.

Abstract

Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging the advantages of advanced AI technology to assist in the decipherment of OBI is a highly essential research topic. However, fully utilizing AI's capabilities in these matters is reliant on having a comprehensive and high-quality annotated OBI dataset at hand whereas most existing datasets are only annotated in just a single or a few dimensions, limiting the value of their potential application. For instance, the Oracle-MNIST dataset only offers 30k images classified into 10 categories.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Image Processing and 3D Reconstruction · Library Science and Information Systems