Chinese-SkillSpan: A Span-Level Dataset for ESCO-Aligned Competency Extraction from Chinese Job Ads

Guojing Li; Zichuan Fu; Junyi Li; Wenxia Zhou; Xinyang Wu; Jinning Yang; Jingtong Gao; Feng Huang; and Xiangyu Zhao

arXiv:2604.23009·cs.CL·April 28, 2026

Chinese-SkillSpan: A Span-Level Dataset for ESCO-Aligned Competency Extraction from Chinese Job Ads

Guojing Li, Zichuan Fu, Junyi Li, Wenxia Zhou, Xinyang Wu, Jinning Yang, Jingtong Gao, Feng Huang, and Xiangyu Zhao

PDF

1 Repo

TL;DR

Chinese-SkillSpan is the first large-scale Chinese JobSkillNER dataset aligned with ESCO, created using an LLM-assisted annotation pipeline, to improve skill extraction from Chinese job ads.

Contribution

This work introduces a novel Chinese JobSkillNER dataset with an innovative LLM-empowered annotation pipeline, filling a key resource gap for Chinese recruitment NLP research.

Findings

01

The dataset contains over 20,000 annotated instances from major platforms.

02

Experimental results demonstrate the dataset's effectiveness for model training and evaluation.

03

Chinese-SkillSpan supports better skill extraction aligned with ESCO standards.

Abstract

Job Skill Named Entity Recognition (JobSkillNER) aims to automatically extract key skill information from large-scale job posting data, which is important for improving talent-market matching efficiency and supporting personalized employment services. To the best of our knowledge, this work presents the first Chinese JobSkillNER dataset for recruitment texts. We propose annotation guidelines tailored to Chinese job postings and an LLM-empowered Macro-Micro collaborative annotation pipeline. The pipeline leverages the contextual understanding ability of large language models (LLMs) for initial annotation and then refines the results through expert sentence-level adjudication. Using this pipeline, we annotate more than 20,000 instances collected from four major recruitment platforms over the period 2014-2025. Based on these efforts, we release Chinese-SkillSpan, the first Chinese…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://sites.google.com/view/cn-skillspan-resources
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.