KoCHET: a Korean Cultural Heritage corpus for Entity-related Tasks
Gyeongmin Kim, Jinsung Kim, Junyoung Son, Heuiseok Lim

TL;DR
KoCHET is a comprehensive Korean cultural heritage corpus designed for entity recognition, relation extraction, and entity typing, facilitating research and practical applications in digital preservation of cultural documents.
Contribution
It introduces a large, expert-advised Korean cultural heritage corpus for key entity-related NLP tasks, with flexible redistribution rights for researchers.
Findings
High-quality, large-scale dataset for Korean cultural heritage entities
Enhanced practical usability demonstrated through experimental results
Provides valuable insights via statistical and linguistic analysis
Abstract
As digitized traditional cultural heritage documents have rapidly increased, resulting in an increased need for preservation and management, practical recognition of entities and typification of their classes has become essential. To achieve this, we propose KoCHET - a Korean cultural heritage corpus for the typical entity-related tasks, i.e., named entity recognition (NER), relation extraction (RE), and entity typing (ET). Advised by cultural heritage experts based on the data construction guidelines of government-affiliated organizations, KoCHET consists of respectively 112,362, 38,765, 113,198 examples for NER, RE, and ET tasks, covering all entity types related to Korean cultural heritage. Moreover, unlike the existing public corpora, modified redistribution can be allowed both domestic and foreign researchers. Our experimental results make the practical usability of KoCHET more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Data Quality and Management
