Improving Clustering on Occupational Text Data through Dimensionality Reduction
Iago Xabier V\'azquez Garc\'ia, Damla Partanaz, Emrullah Fatih Yetkin

TL;DR
This paper presents a novel clustering pipeline using BERT-based techniques and dimensionality reduction to improve occupational data mapping, aiding career transitions and data consistency across different regions.
Contribution
It introduces a new clustering and dimensionality reduction pipeline tailored for occupational data, enhancing automatic occupation distinction and mapping accuracy.
Findings
Dimensionality reduction improves clustering performance metrics.
Specialized silhouette method enhances clustering quality.
Pipeline effectively maps occupations across different definitions.
Abstract
In this study, we focused on proposing an optimal clustering mechanism for the occupations defined in the well-known US-based occupational database, O*NET. Even though all occupations are defined according to well-conducted surveys in the US, their definitions can vary for different firms and countries. Hence, if one wants to expand the data that is already collected in O*NET for the occupations defined with different tasks, a map between the definitions will be a vital requirement. We proposed a pipeline using several BERT-based techniques with various clustering approaches to obtain such a map. We also examined the effect of dimensionality reduction approaches on several metrics used in measuring performance of clustering algorithms. Finally, we improved our results by using a specialized silhouette approach. This new clustering-based mapping approach with dimensionality reduction may…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOccupational Therapy Practice and Research · Information Systems Education and Curriculum Development · AI and HR Technologies
