Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings

Stephen Meisenbacher; Peter Norlander

arXiv:2605.21029·cs.CL·May 21, 2026

Building a Custom Taxonomy of AI Skills and Tasks from the Ground Up with Job Postings

Stephen Meisenbacher, Peter Norlander

PDF

TL;DR

This paper introduces TaxonomyBuilder, a systematic approach for constructing AI skills taxonomies from job postings, showing that filtering data improves domain coverage over unfiltered methods.

Contribution

It presents a new framework for building custom AI skills taxonomies from large-scale job data, emphasizing data filtering for better results.

Findings

01

Filtered data yields better domain coverage than unfiltered data.

02

Less data can lead to clearer, more accurate taxonomies.

03

Hierarchical taxonomy labeling benefits from data filtering.

Abstract

Utilizing LLMs for automated taxonomy construction presents a clear opportunity for the comprehensive, yet efficient mapping of potentially complex domains. When contending with high volumes of rapidly growing corpora, however, it becomes unclear how to best leverage such data for optimal taxonomy construction. Taking the case of systematizing AI skills in the workplace, we use two large-scale job postings corpora to investigate key design decisions for the inclusion (or exclusion) of data points for taxonomy construction. We propose TaxonomyBuilder as a blueprint for our systematic study, with which we evaluate various configurations of custom, data-informed, and hierarchical taxonomies. We demonstrate that less data can provide more clarity: filtering inputs to TaxonomyBuilder provides better domain-specific coverage than offering unfiltered inputs to clustering and LLM-enhanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.