ConFit v2: Improving Resume-Job Matching using Hypothetical Resume   Embedding and Runner-Up Hard-Negative Mining

Xiao Yu; Ruize Xu; Chengyuan Xue; Jinzhong Zhang; Xu Ma; Zhou Yu

arXiv:2502.12361·cs.CL·March 4, 2025

ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining

Xiao Yu, Ruize Xu, Chengyuan Xue, Jinzhong Zhang, Xu Ma, Zhou Yu

PDF

Open Access

TL;DR

ConFit v2 enhances resume-job matching by using hypothetical resume embeddings generated by a language model and a novel hard-negative mining strategy, significantly improving ranking performance on real-world datasets.

Contribution

It introduces two techniques—hypothetical resume augmentation and hard-negative mining—to improve contrastive learning in resume-job matching.

Findings

01

Achieves 13.8% higher recall on average

02

Achieves 17.5% higher nDCG on average

03

Outperforms prior methods including BM25 and OpenAI embeddings

Abstract

A reliable resume-job matching system helps a company recommend suitable candidates from a pool of resumes and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction labels in resume-job datasets are sparse. We introduce ConFit v2, an improvement over ConFit to tackle this sparsity problem. We propose two techniques to enhance the encoder's contrastive training process: augmenting job data with hypothetical reference resume generated by a large language model; and creating high-quality hard negatives from unlabeled resume/job pairs using a novel hard-negative mining strategy. We evaluate ConFit v2 on two real-world datasets and demonstrate that it outperforms ConFit and prior methods (including BM25 and OpenAI text-embedding-003), achieving an average absolute improvement of 13.8% in recall and 17.5% in nDCG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks