JCTC: A Large Job posting Corpus for Text Classification
Haoyu Xu (1, 2), Chongyang Gu (1, 3), Han Zhou (1), Sengpan Kou, (4), Junjie Zhang (3) ((1) Shanghai Advanced Research Institute, Chinese, Academy of Sciences, China, (2) University of Chinese Academy of Sciences,, China,(3) Department of Communication, Information Engineering

TL;DR
This paper introduces JCTC, a large Chinese job posting corpus for text classification, created using a combination of formal standards, machine learning, and human judgment, enabling better labor market analysis.
Contribution
The paper presents the first and largest Chinese job posting corpus for text classification, with a novel construction framework combining formal standards, unsupervised, supervised learning, and human judgment.
Findings
JCTC contains 102,581 job postings across 465 categories.
Benchmark results for five deep learning models on JCTC are provided.
The method reduces subjective bias and is applicable beyond Chinese language.
Abstract
The absence of an appropriate text classification corpus makes the massive amount of online job information unusable for labor market analysis. This paper presents JCTC, a large job posting corpus for text classification. In JCTC construction framework, a formal specification issued by the Chinese central government is chosen as the classification standard. The unsupervised learning (WE-cos), supervised learning algorithm (SVM) and human judgements are all used in the construction process. JCTC has 102581 online job postings distributed in 465 categories. The method proposed here can not only ameliorate the high demands on people's skill and knowledge, but reduce the subjective influences as well. Besides, the method is not limited in Chinese. We benchmark five state-of-the-art deep learning approaches on JCTC providing baseline results for future studies. JCTC might be the first job…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
