Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark
Zhenran Xu, Zifei Shan, Yuxin Li, Baotian Hu, Bing Qin

TL;DR
Hansel is a new Chinese benchmark dataset for few-shot and zero-shot entity linking, highlighting challenges in tail and emerging entities, and providing a basis for evaluating and improving EL systems in non-English contexts.
Contribution
The paper introduces Hansel, a novel Chinese EL benchmark dataset with human-annotated test sets and a new data collection method for zero-shot EL, addressing a gap in non-English entity linking resources.
Findings
Existing EL systems perform poorly on Hansel (R@1 36.6%).
Baseline models improve performance to R@1 46.2% (Few-Shot) and 76.6% (Zero-Shot).
Baseline achieves competitive results on TAC-KBP2015 Chinese EL task.
Abstract
Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Machine Learning in Healthcare
MethodsTest
