Exhaustive Entity Recognition for Coptic: Challenges and Solutions
Amir Zeldes, Lance Martin, Sichang Tu

TL;DR
This paper addresses the challenges of entity recognition in low-resource, morphologically complex Coptic texts, proposing solutions that leverage dependency parsing, CRF models, and semi-automatic linking to Wikipedia.
Contribution
It introduces novel methods for nested entity recognition and linking in Coptic, achieving high accuracy with limited data compared to high-resource languages.
Findings
Effective semi-automatic entity linking to Wikipedia.
High-accuracy NER with minimal data.
Solutions applicable to other low-resource languages.
Abstract
Entity recognition provides semantic access to ancient materials in the Digital Humanities: itexposes people and places of interest in texts that cannot be read exhaustively, facilitates linkingresources and can provide a window into text contents, even for texts with no translations. Inthis paper we present entity recognition for Coptic, the language of Hellenistic era Egypt. Weevaluate NLP approaches to the task and lay out difficulties in applying them to a low-resource,morphologically complex language. We present solutions for named and non-named nested en-tity recognition and semi-automatic entity linking to Wikipedia, relying on robust dependencyparsing, feature-based CRF models, and hand-crafted knowledge base resources, enabling highaccuracy NER with orders of magnitude less data than those used for high resource languages.The results suggest avenues for research on other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
MethodsConditional Random Field
