# CMap: a database for mapping job titles, sector specialization, and promotions across 24 sectors

**Authors:** Shehryar Subhani, Shahan Ali Memon, Bedoor AlShebli

PMC · DOI: 10.1038/s41597-025-05526-3 · Scientific Data · 2025-07-14

## TL;DR

CMap is a large dataset mapping job titles, sectors, and promotions across 24 industries to study career mobility and labor market patterns.

## Contribution

CMap introduces a standardized job title database and a Specialization Index to analyze sector concentration and career advancement.

## Key findings

- 5.2 million job titles were standardized into 123 thousand unique titles using NLP and large language models.
- 32 thousand validated and 61 thousand inferred promotions were identified across the US, UK, and globally.
- The dataset supports research on job hierarchies, cross-sector mobility, and inequalities in professional advancement.

## Abstract

Understanding job titles, career trajectories, and promotions provides valuable insight into labor market dynamics and patterns of professional mobility. We introduce Career Map (CMap), a novel, large-scale dataset spanning 24 industry sectors, designed to support the study of job specialization, sectoral concentration, and career advancement. Using natural language processing techniques and large language models, we standardize 5.2 million job titles into 123 thousand unique titles and propose a Specialization Index to quantify how concentrated a given title is within a sector. The dataset includes both a structured job titles dataset and a set of identified promotions—32 thousand validated promotions from the United States and the United Kingdom, and 61 thousand inferred promotions from a global context. CMap enables research on job hierarchies, cross-sector mobility, and systemic inequalities in professional advancement. It provides a foundation for examining how education, experience, and institutional structures shape career outcomes across industries and regions, offering a valuable resource for economists, sociologists, and computational social scientists.

## Full-text entities

- **Genes:** CST7 (cystatin F) [NCBI Gene 8530] {aka CMAP}
- **Diseases:** SD (MESH:D012607), LLMs (MESH:D007806)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12260075/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12260075/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/PMC12260075/full.md

---
Source: https://tomesphere.com/paper/PMC12260075