Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion   Model

Jing Li; Qiu-Feng Wang; Siyuan Wang; Rui Zhang; Kaizhu Huang; and Erik; Cambria

arXiv:2312.13631·cs.CV·July 9, 2024·2 cites

Diff-Oracle: Deciphering Oracle Bone Scripts with Controllable Diffusion Model

Jing Li, Qiu-Feng Wang, Siyuan Wang, Rui Zhang, Kaizhu Huang, and Erik, Cambria

PDF

Open Access

TL;DR

Diff-Oracle introduces a diffusion-based model with style and content control for generating oracle bone script images, significantly aiding in deciphering and recognition tasks with high accuracy.

Contribution

The paper presents Diff-Oracle, a novel diffusion model incorporating style and content encoders, to generate diverse and controllable oracle characters, advancing the field of oracle script deciphering.

Findings

01

Outperforms existing generative methods in image quality.

02

Achieves 7.70% accuracy improvement in zero-shot recognition.

03

Sets new benchmark with 84.62% accuracy on OBC306 dataset.

Abstract

Deciphering oracle bone scripts plays an important role in Chinese archaeology and philology. However, a significant challenge remains due to the scarcity of oracle character images. To overcome this issue, we propose Diff-Oracle, a novel approach based on diffusion models to generate a diverse range of controllable oracle characters. Unlike traditional diffusion models that operate primarily on text prompts, Diff-Oracle incorporates a style encoder that utilizes style reference images to control the generation style. This encoder extracts style prompts from existing oracle character images, where style details are converted into a text embedding format via a pretrained language-vision model. On the other hand, a content encoder is integrated within Diff-Oracle to capture specific content details from content reference images, ensuring that the generated characters accurately represent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion · Contrastive Language-Image Pre-training