NNOSE: Nearest Neighbor Occupational Skill Extraction
Mike Zhang, Rob van der Goot, Min-Yen Kan, Barbara Plank

TL;DR
NNOSE introduces a retrieval-augmented approach for occupational skill extraction that leverages multiple datasets and external datastores to improve performance, especially on rare skills, without extra fine-tuning.
Contribution
The paper presents NNOSE, a novel retrieval-based method that enhances skill extraction across diverse datasets by utilizing external datastores, addressing data scarcity and rare skill identification.
Findings
Up to 30% span-F1 improvement in cross-dataset skill prediction.
Effective retrieval-augmentation enhances performance without additional fine-tuning.
Improved identification of infrequent occupational skills.
Abstract
The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text. With the advent of English benchmark job description datasets, there is a need for systems that handle their diversity well. We tackle the complexity in occupational skill datasets tasks -- combining and leveraging multiple datasets for skill extraction, to identify rarely observed skills within a dataset, and overcoming the scarcity of skills across datasets. In particular, we investigate the retrieval-augmentation of language models, employing an external datastore for retrieving similar skills in a dataset-unifying manner. Our proposed method, \textbf{N}earest \textbf{N}eighbor \textbf{O}ccupational \textbf{S}kill \textbf{E}xtraction (NNOSE) effectively leverages multiple datasets by retrieving neighboring skills from other datasets in the datastore. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScheduling and Timetabling Solutions
