Job Skill Extraction via LLM-Centric Multi-Module Framework
Guojing Li (1, 2), Zichuan Fu (1), Junyi Li (1), Faxue Liu (1), Wenxia Zhou (2), Yejing Wang (1), Jingtong Gao (1), Maolin Wang (1), Rungen Liu (1), Wenlin Zhang (1), Xiangyu Zhao (1) ((1) City University of Hong Kong, (2) Renmin University of China)

TL;DR
This paper introduces SRICL, a comprehensive framework that enhances large language model-based skill extraction from job ads by combining semantic retrieval, in-context learning, supervised fine-tuning, and verification, improving accuracy and reliability.
Contribution
The paper presents SRICL, a novel multi-module framework that significantly improves span-level skill extraction accuracy and robustness across diverse domains and languages.
Findings
SRICL outperforms GPT-3.5 prompting baselines in STRICT-F1 scores.
It reduces invalid tags and hallucinations in extracted spans.
Enables dependable deployment in low-resource, multi-domain settings.
Abstract
Span-level skill extraction from job advertisements underpins candidate-job matching and labor-market analytics, yet generative large language models (LLMs) often yield malformed spans, boundary drift, and hallucinations, especially with long-tail terms and cross-domain shift. We present SRICL, an LLM-centric framework that combines semantic retrieval (SR), in-context learning (ICL), and supervised fine-tuning (SFT) with a deterministic verifier. SR pulls in-domain annotated sentences and definitions from ESCO to form format-constrained prompts that stabilize boundaries and handle coordination. SFT aligns output behavior, while the verifier enforces pairing, non-overlap, and BIO legality with minimal retries. On six public span-labeled corpora of job-ad sentences across sectors and languages, SRICL achieves substantial STRICT-F1 improvements over GPT-3.5 prompting baselines and sharply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
