Flexible Job Classification with Zero-Shot Learning
Thom Lake

TL;DR
This paper demonstrates that zero-shot learning with fine-tuned language models can effectively classify documents in expanding taxonomies, outperforming traditional methods and reducing computational costs in human resource applications.
Contribution
It provides empirical evidence that zero-shot multi-label classification improves performance and flexibility in taxonomy expansion scenarios, with strategies to reduce computational overhead.
Findings
Zero-shot classifiers achieve 12% higher macro-AP than traditional classifiers.
Focusing annotation resources on fewer classes with zero-shot methods is more effective.
Filter/re-rank techniques reduce computational costs by 98% with minimal performance loss.
Abstract
Using a taxonomy to organize information requires classifying objects (documents, images, etc) with appropriate taxonomic classes. The flexible nature of zero-shot learning is appealing for this task because it allows classifiers to naturally adapt to taxonomy modifications. This work studies zero-shot multi-label document classification with fine-tuned language models under realistic taxonomy expansion scenarios in the human resource domain. Experiments show that zero-shot learning can be highly effective in this setting. When controlling for training data budget, zero-shot classifiers achieve a 12% relative increase in macro-AP when compared to a traditional multi-label classifier trained on all classes. Counterintuitively, these results suggest in some settings it would be preferable to adopt zero-shot techniques and spend resources annotating more documents with an incomplete set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Text and Document Classification Technologies · Multimodal Machine Learning Applications
