Efficient Text Encoders for Labor Market Analysis
Jens-Joris Decorte, Jeroen Van Hautte, Chris Develder, and Thomas Demeester

TL;DR
This paper introduces ConTeXT-match, a lightweight contrastive learning method for skill extraction from job ads, achieving state-of-the-art accuracy and efficiency, supported by a new benchmark and improved job title normalization.
Contribution
It presents a novel contrastive learning approach for skill classification, a new benchmark dataset, and an improved job title normalization model, enhancing large-scale labor market analysis.
Findings
ConTeXT-match outperforms existing models in skill extraction accuracy.
The approach is computationally efficient and scalable for real-time analysis.
The new Skill-XL benchmark enables robust evaluation of skill extraction methods.
Abstract
Labor market analysis relies on extracting insights from job advertisements, which provide valuable yet unstructured information on job titles and corresponding skill requirements. While state-of-the-art methods for skill extraction achieve strong performance, they depend on large language models (LLMs), which are computationally expensive and slow. In this paper, we propose \textbf{ConTeXT-match}, a novel contrastive learning approach with token-level attention that is well-suited for the extreme multi-label classification task of skill classification. \textbf{ConTeXT-match} significantly improves skill extraction efficiency and performance, achieving state-of-the-art results with a lightweight bi-encoder model. To support robust evaluation, we introduce \textbf{Skill-XL}, a new benchmark with exhaustive, sentence-level skill annotations that explicitly address the redundancy in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Contrastive Learning
