Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data

Zhaohua Liang; Zhilin Wang; Renjie Cao; Yining Zhang

arXiv:2604.17738·cs.CL·April 21, 2026

Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data

Zhaohua Liang, Zhilin Wang, Renjie Cao, Yining Zhang

PDF

TL;DR

This paper introduces mira-embeddings-v1, a semantic reranking system for recruitment that leverages LLM-synthesized data and boundary-aware reranking to improve candidate retrieval performance.

Contribution

It presents a novel domain-adapted semantic reranking approach using LLM-generated supervision and a lightweight reranking head, eliminating the need for large-scale manual labels.

Findings

01

Recall@50 improved from 68.89% to 77.55%.

02

Precision@10 increased from 35.77% to 39.62%.

03

Recall@200 on global pool reached 0.7047.

Abstract

Candidate sourcing for recruiters is best viewed as a two-stage retrieval and reranking pipeline with recall as the primary objective under a limited review budget. An upstream production retriever first returns a candidate shortlist for each job description (JD), and our goal is to rerank that shortlist so that qualified candidates appear as high as possible. We present mira-embeddings-v1, a semantic reranking system for the recruitment domain that reshapes the embedding space with LLM-synthesized training data and corrects boundary confusions with a lightweight reranking head. Starting from real JDs, we build a five-stage prompt pipeline to generate diverse positive and hard negative samples that sculpt the semantic space from multiple angles. We then apply a two-round LoRA adaptation: JD--JD contrastive training followed by JD--CV triplet alignment on a heterogeneous text dataset.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.