VietJobs: A Vietnamese Job Advertisement Dataset

Hieu Pham Dinh; Hung Nguyen Huy; Mo El-Haj

arXiv:2603.05262·cs.CL·March 6, 2026

VietJobs: A Vietnamese Job Advertisement Dataset

Hieu Pham Dinh, Hung Nguyen Huy, Mo El-Haj

PDF

Open Access

TL;DR

VietJobs introduces the first large-scale Vietnamese job advertisement dataset, supporting NLP and labour market research, and benchmarks several large language models on classification and salary prediction tasks.

Contribution

The paper provides the first extensive Vietnamese job ad dataset and evaluates state-of-the-art language models on key labour market NLP tasks.

Findings

01

Instruction-tuned LLMs show strong performance in few-shot and fine-tuned settings.

02

Challenges remain in multilingual and Vietnamese-specific structured prediction.

03

VietJobs sets a new benchmark for Vietnamese NLP research.

Abstract

VietJobs is the first large-scale, publicly available corpus of Vietnamese job advertisements, comprising 48,092 postings and over 15 million words collected from all 34 provinces and municipalities across Vietnam. The dataset provides extensive linguistic and structured information, including job titles, categories, salaries, skills, and employment conditions, covering 16 occupational domains and multiple employment types (full-time, part-time, and internship). Designed to support research in natural language processing and labour market analytics, VietJobs captures substantial linguistic, regional, and socio-economic diversity. We benchmark several generative large language models (LLMs) on two core tasks: job category classification and salary estimation. Instruction-tuned models such as Qwen2.5-7B-Instruct and Llama-SEA-LION-v3-8B-IT demonstrate notable gains under few-shot and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Sentiment Analysis and Opinion Mining · Machine Learning in Healthcare