Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Zhenran Xu; Zifei Shan; Yuxin Li; Baotian Hu; Bing Qin

arXiv:2207.13005·cs.CL·October 31, 2023·1 cites

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Zhenran Xu, Zifei Shan, Yuxin Li, Baotian Hu, Bing Qin

PDF

Open Access 1 Repo 2 Datasets

TL;DR

Hansel is a new Chinese benchmark dataset for few-shot and zero-shot entity linking, highlighting challenges in tail and emerging entities, and providing a basis for evaluating and improving EL systems in non-English contexts.

Contribution

The paper introduces Hansel, a novel Chinese EL benchmark dataset with human-annotated test sets and a new data collection method for zero-shot EL, addressing a gap in non-English entity linking resources.

Findings

01

Existing EL systems perform poorly on Hansel (R@1 36.6%).

02

Baseline models improve performance to R@1 46.2% (Few-Shot) and 76.6% (Zero-Shot).

03

Baseline achieves competitive results on TAC-KBP2015 Chinese EL task.

Abstract

Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HITsz-TMG/Hansel
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Data Quality and Management · Machine Learning in Healthcare

MethodsTest