KoCoNovel: Annotated Dataset of Character Coreference in Korean Novels
Kyuhee Kim, Surin Lee, Sangah Lee

TL;DR
KoCoNovel is a large, annotated Korean literary dataset for character coreference, supporting diverse analysis perspectives and improving coreference resolution models by capturing cultural nuances.
Contribution
It introduces KoCoNovel, the first Korean literary coreference dataset with multiple annotation versions and cultural features, advancing Korean NLP research.
Findings
Enhanced coreference model performance with KoCoNovel
Supports multiple analysis perspectives and entity types
Highlights cultural influences on Korean coreference
Abstract
In this paper, we present KoCoNovel, a novel character coreference dataset derived from Korean literary texts, complete with detailed annotation guidelines. Comprising 178K tokens from 50 modern and contemporary novels, KoCoNovel stands as one of the largest public coreference resolution corpora in Korean, and the first to be based on literary texts. KoCoNovel offers four distinct versions to accommodate a wide range of literary coreference analysis needs. These versions are designed to support perspectives of the omniscient author or readers, and to manage multiple entities as either separate or overlapping, thereby broadening its applicability. One of KoCoNovel's distinctive features is that 24% of all character mentions are single common nouns, lacking possessive markers or articles. This feature is particularly influenced by the nuances of Korean address term culture, which favors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Diverse Approaches in Healthcare and Education Studies
