NNOSE: Nearest Neighbor Occupational Skill Extraction

Mike Zhang; Rob van der Goot; Min-Yen Kan; Barbara Plank

arXiv:2401.17092·cs.CL·January 31, 2024·1 cites

NNOSE: Nearest Neighbor Occupational Skill Extraction

Mike Zhang, Rob van der Goot, Min-Yen Kan, Barbara Plank

PDF

Open Access 1 Repo

TL;DR

NNOSE introduces a retrieval-augmented approach for occupational skill extraction that leverages multiple datasets and external datastores to improve performance, especially on rare skills, without extra fine-tuning.

Contribution

The paper presents NNOSE, a novel retrieval-based method that enhances skill extraction across diverse datasets by utilizing external datastores, addressing data scarcity and rare skill identification.

Findings

01

Up to 30% span-F1 improvement in cross-dataset skill prediction.

02

Effective retrieval-augmentation enhances performance without additional fine-tuning.

03

Improved identification of infrequent occupational skills.

Abstract

The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text. With the advent of English benchmark job description datasets, there is a need for systems that handle their diversity well. We tackle the complexity in occupational skill datasets tasks -- combining and leveraging multiple datasets for skill extraction, to identify rarely observed skills within a dataset, and overcoming the scarcity of skills across datasets. In particular, we investigate the retrieval-augmentation of language models, employing an external datastore for retrieving similar skills in a dataset-unifying manner. Our proposed method, \textbf{N}earest \textbf{N}eighbor \textbf{O}ccupational \textbf{S}kill \textbf{E}xtraction (NNOSE) effectively leverages multiple datasets by retrieving neighboring skills from other datasets in the datastore. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mainlp/nnose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScheduling and Timetabling Solutions