LLM4Jobs: Unsupervised occupation extraction and standardization leveraging Large Language Models
Nan Li, Bo Kang, Tijl De Bie

TL;DR
This paper presents LLM4Jobs, an unsupervised approach using large language models for extracting and standardizing occupations from free-text job data, outperforming existing benchmarks across various datasets.
Contribution
Introduces LLM4Jobs, a novel unsupervised method leveraging LLMs for occupation coding, with new datasets and superior performance over state-of-the-art methods.
Findings
LLM4Jobs outperforms existing unsupervised benchmarks.
The approach is versatile across diverse datasets.
New synthetic and real-world datasets are provided.
Abstract
Automated occupation extraction and standardization from free-text job postings and resumes are crucial for applications like job recommendation and labor market policy formation. This paper introduces LLM4Jobs, a novel unsupervised methodology that taps into the capabilities of large language models (LLMs) for occupation coding. LLM4Jobs uniquely harnesses both the natural language understanding and generation capacities of LLMs. Evaluated on rigorous experimentation on synthetic and real-world datasets, we demonstrate that LLM4Jobs consistently surpasses unsupervised state-of-the-art benchmarks, demonstrating its versatility across diverse datasets and granularities. As a side result of our work, we present both synthetic and real-world datasets, which may be instrumental for subsequent research in this domain. Overall, this investigation highlights the promise of contemporary LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
