TL;DR
Chinese-SkillSpan is the first large-scale Chinese JobSkillNER dataset aligned with ESCO, created using an LLM-assisted annotation pipeline, to improve skill extraction from Chinese job ads.
Contribution
This work introduces a novel Chinese JobSkillNER dataset with an innovative LLM-empowered annotation pipeline, filling a key resource gap for Chinese recruitment NLP research.
Findings
The dataset contains over 20,000 annotated instances from major platforms.
Experimental results demonstrate the dataset's effectiveness for model training and evaluation.
Chinese-SkillSpan supports better skill extraction aligned with ESCO standards.
Abstract
Job Skill Named Entity Recognition (JobSkillNER) aims to automatically extract key skill information from large-scale job posting data, which is important for improving talent-market matching efficiency and supporting personalized employment services. To the best of our knowledge, this work presents the first Chinese JobSkillNER dataset for recruitment texts. We propose annotation guidelines tailored to Chinese job postings and an LLM-empowered Macro-Micro collaborative annotation pipeline. The pipeline leverages the contextual understanding ability of large language models (LLMs) for initial annotation and then refines the results through expert sentence-level adjudication. Using this pipeline, we annotate more than 20,000 instances collected from four major recruitment platforms over the period 2014-2025. Based on these efforts, we release Chinese-SkillSpan, the first Chinese…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
