Deep Learning for Classical Japanese Literature
Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb,, Kazuaki Yamamoto, David Ha

TL;DR
This paper introduces new datasets for machine learning focused on classical Japanese cursive script, aiming to bridge ML research with cultural and historical literature analysis.
Contribution
It presents Kuzushiji-MNIST, Kuzushiji-49, and Kuzushiji-Kanji datasets to promote ML applications in classical Japanese literature understanding.
Findings
Datasets enable recognition of classical Japanese scripts.
Facilitate ML research on culturally relevant tasks.
Encourage engagement with historical literature through ML.
Abstract
Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the perspective of ML researchers, the content of the task itself is largely irrelevant, and thus there have increasingly been calls for benchmark tasks to more heavily focus on problems which are of social or cultural relevance. In this work, we introduce Kuzushiji-MNIST, a dataset which focuses on Kuzushiji (cursive Japanese), as well as two larger, more challenging datasets, Kuzushiji-49 and Kuzushiji-Kanji. Through these datasets, we wish to engage the machine learning community into the world of classical Japanese literature. Dataset available at https://github.com/rois-codh/kmnist
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques
