Multilingual hierarchical classification of job advertisements for job vacancy statistics
Maciej Ber\k{e}sewicz, Marek Wydmuch, Herman Cherniaiev, Robert Pater

TL;DR
This paper presents a multilingual hierarchical transformer-based classifier for occupation codes in online job ads, improving accuracy and supporting international occupational classification standards.
Contribution
It introduces a novel hierarchical classification model utilizing transformer architecture and multilingual data, including a new Central Job Offers Database, for occupation coding.
Findings
Hierarchical structure improves prediction accuracy by 1-2 percentage points.
The model supports 24 languages, enhancing international comparability.
Open-source software is provided for the statistics community.
Abstract
The goal of this paper is to develop a multilingual classifier and conditional probability estimator of occupation codes for online job advertisements in accordance with the International Standard Classification of Occupations (ISCO) extended with the Polish Classification of Occupations and Specializations (KZiS), which is analogous to the European Classification of Occupations. In this paper, we utilise a range of data sources, including a novel one, namely the Central Job Offers Database, which is a register of all vacancies submitted to Public Employment Offices. Their staff members code the vacancies according to the ISCO and KZiS. A hierarchical multi-class classifier has been developed based on the transformer architecture. The classifier begins by encoding the jobs found in advertisements to the widest 1-digit occupational group, and then narrows the assignment to a 6-digit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Natural Language Processing Techniques · Authorship Attribution and Profiling
MethodsFocus
