# Data for assigning a proxy variable for office worker in open-ended responses on occupation in Swedish questionnaires

**Authors:** Annika Tillander, Susanna Lehtinen-Jacks, Nisha Singh, Oskar Halling Ullberg, Ulrika Florin, Katarina Bälter

PMC · DOI: 10.1016/j.dib.2025.112105 · Data in Brief · 2025-09-24

## TL;DR

This paper provides data and code to identify office workers from self-reported job titles in Swedish surveys, aiding health research.

## Contribution

The paper introduces a proxy variable and R code for categorizing office workers using Swedish occupation data.

## Key findings

- A proxy variable for office workers was developed using SSYK 2012 occupation titles.
- The proxy variable was validated using pilot data from the LifeGene study.
- The R code can be applied to datasets with Swedish occupation responses.

## Abstract

In numerous research disciplines, including epidemiology, it is common to compare different occupational categories, such as office workers and non-office workers. When only self-reported occupation titles are available, it is necessary to categorize individuals based on their self-reported titles. Thus, the possibility to identify office workers via self-reported occupation titles can enhance research on the health and well-being of office workers in large population-based epidemiological studies, even without specific questions about office work.

This paper introduces data and R code that can be used to assign a proxy variable for office worker based on responses to an open-ended question (OEQ) about occupation in Swedish questionnaires. The proxy variable is based on the Swedish Standard Classification of Occupations 2012 (SSYK 2012), which includes 8946 occupation titles. Using a translation key, the titles have been categorized into three groups: managers, white-collar workers, and blue-collar workers. White-collar workers (including managers) are considered office workers, while blue-collar workers are classified as non-office workers. The proxy variable has been refined using pilot data from the Swedish population-based epidemiological resource LifeGene.

The R code, together with the proxy variable, can be used in any dataset with a Swedish OEQ about occupation, facilitating the categorization of respondents as either white-collar or blue-collar workers and serving as a proxy variable for office worker. The R code can be used for OEQs regardless of language, provided there is a dataset with a standard classification of occupation in the desired language.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12528912/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12528912/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/PMC12528912/full.md

---
Source: https://tomesphere.com/paper/PMC12528912