Large Language Models as Batteries-Included Zero-Shot ESCO Skills Matchers
Benjamin Clavi\'e, Guillaume Souli\'e

TL;DR
This paper presents a zero-shot skills extraction system using large language models that generates synthetic data, employs retrieval and re-ranking techniques, and outperforms previous methods without requiring human annotations.
Contribution
The work introduces an end-to-end LLM-based zero-shot skills extraction framework that leverages synthetic data and re-ranking, significantly improving accuracy over prior approaches.
Findings
Synthetic data improves skills extraction accuracy.
GPT-4 re-ranking enhances performance by over 22 points RP@10.
Framing as mock programming prompts yields better results with weaker LLMs.
Abstract
Understanding labour market dynamics requires accurately identifying the skills required for and possessed by the workforce. Automation techniques are increasingly being developed to support this effort. However, automatically extracting skills from job postings is challenging due to the vast number of existing skills. The ESCO (European Skills, Competences, Qualifications and Occupations) framework provides a useful reference, listing over 13,000 individual skills. However, skills extraction remains difficult and accurately matching job posts to the ESCO taxonomy is an open problem. In this work, we propose an end-to-end zero-shot system for skills extraction from job descriptions based on large language models (LLMs). We generate synthetic training data for the entirety of ESCO skills and train a classifier to extract skill mentions from job posts. We also employ a similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Dropout
