Procode: the Swiss Multilingual Solution for Automatic Coding and Recoding of Occupations and Economic Activities
Nenad Savic, Nicolas Bovio, Fabian Gilbert, Irina Guseva Canu

TL;DR
Procode is a web-based tool that uses machine learning classifiers to automate coding and recoding of occupation and activity data, improving accuracy and efficiency in epidemiological research.
Contribution
The paper introduces Procode, a novel web-tool that employs machine learning classifiers for automatic coding and recoding of occupation and activity classifications.
Findings
CNB achieved 57-81% accuracy in coding tasks.
Coding took approximately 1 minute per 10,000 records.
Recoding was completed in 5-10 seconds, faster than coding.
Abstract
Objective. Epidemiological studies require data that are in alignment with the classifications established for occupations or economic activities. The classifications usually include hundreds of codes and titles. Manual coding of raw data may result in misclassification and be time consuming. The goal was to develop and test a web-tool, named Procode, for coding of free-texts against classifications and recoding between different classifications. Methods. Three text classifiers, i.e. Complement Naive Bayes (CNB), Support Vector Machine (SVM) and Random Forest Classifier (RFC), were investigated using a k-fold cross-validation. 30 000 free-texts with manually assigned classification codes of French classification of occupations (PCS) and French classification of activities (NAF) were available. For recoding, Procode integrated a workflow that converts codes of one classification to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth, Environment, Cognitive Aging · Nutritional Studies and Diet · Cardiovascular Health and Risk Factors
MethodsSupport Vector Machine
