Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction
Jens-Joris Decorte, Jeroen Van Hautte, Johannes Deleu, Chris Develder, and Thomas Demeester

TL;DR
This paper presents a new end-to-end skill extraction system using distant supervision and novel negative sampling strategies, significantly improving the detection of implicit skills in job market data.
Contribution
It introduces and evaluates negative sampling strategies, especially using the ESCO taxonomy, to enhance skill extraction performance from distantly supervised data.
Findings
Using ESCO taxonomy for negative sampling yields the biggest improvements.
Combining multiple strategies increases performance by up to 8 percentage points in RP@5.
The proposed system effectively detects implicit skills despite limited annotated data.
Abstract
Skills play a central role in the job market and many human resources (HR) processes. In the wake of other digital experiences, today's online job market has candidates expecting to see the right opportunities based on their skill set. Similarly, enterprises increasingly need to use data to guarantee that the skills within their workforce remain future-proof. However, structured information about skills is often missing, and processes building on self- or manager-assessment have shown to struggle with issues around adoption, completeness, and freshness of the resulting data. Extracting skills is a highly challenging task, given the many thousands of possible skill labels mentioned either explicitly or merely described implicitly and the lack of finely annotated training corpora. Previous work on skill extraction overly simplifies the task to an explicit entity detection task or builds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · AI and HR Technologies · Online Learning and Analytics
