Deep Learning for Classical Japanese Literature

Tarin Clanuwat; Mikel Bober-Irizar; Asanobu Kitamoto; Alex Lamb,; Kazuaki Yamamoto; David Ha

arXiv:1812.01718·cs.CV·December 6, 2018·486 cites

Deep Learning for Classical Japanese Literature

Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb,, Kazuaki Yamamoto, David Ha

PDF

Open Access 5 Repos 1 Datasets

TL;DR

This paper introduces new datasets for machine learning focused on classical Japanese cursive script, aiming to bridge ML research with cultural and historical literature analysis.

Contribution

It presents Kuzushiji-MNIST, Kuzushiji-49, and Kuzushiji-Kanji datasets to promote ML applications in classical Japanese literature understanding.

Findings

01

Datasets enable recognition of classical Japanese scripts.

02

Facilitate ML research on culturally relevant tasks.

03

Encourage engagement with historical literature through ML.

Abstract

Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the perspective of ML researchers, the content of the task itself is largely irrelevant, and thus there have increasingly been calls for benchmark tasks to more heavily focus on problems which are of social or cultural relevance. In this work, we introduce Kuzushiji-MNIST, a dataset which focuses on Kuzushiji (cursive Japanese), as well as two larger, more challenging datasets, Kuzushiji-49 and Kuzushiji-Kanji. Through these datasets, we wish to engage the machine learning community into the world of classical Japanese literature. Dataset available at https://github.com/rois-codh/kmnist

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

randall-lab/kmnist
dataset· 39 dl
39 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques