The Impact of Question Framing on the Performance of Automatic Occupation Coding
Olga Kononykhina, Frauke Kreuter, Malte Schierholz

TL;DR
This study examines how different question wordings in occupational surveys affect data variability and the performance of automatic coding tools, highlighting the importance of question design for accurate occupational data classification.
Contribution
It demonstrates that question phrasing significantly impacts automatic coding efficiency and suggests ways to optimize survey questions for better data quality.
Findings
Automatic coding tools perform better with job title questions.
Question wording influences data variability and coding accuracy.
Providing examples increases response length without broadening vocabulary.
Abstract
Occupational data play a vital role in research, official statistics, and policymaking, yet their collection and accurate classification remain a challenge. This study investigates the effects of occupational question wording on data variability and the performance of automatic coding tools. We conducted and replicated a split-ballot survey experiment in Germany using two common occupational question formats: one focusing on "job title" (Berufsbezeichnung) and another on "berufliche T\"atigkeit" (loosely translated as occupation or occupational task). Our analysis reveals that automatic coding tools, such as CASCOT and OccuCoDe, exhibit sensitivity to the form and origin of the data. Specifically, these tools were more efficient when coding responses to the job title question format than the occupational task format, suggesting a potential way to improve the respective questions for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment
