Mira-Embeddings-V1: Domain-Adapted Semantic Reranking for Recruitment via LLM-Synthesized Data
Zhaohua Liang, Zhilin Wang, Renjie Cao, Yining Zhang

TL;DR
This paper introduces mira-embeddings-v1, a semantic reranking system for recruitment that leverages LLM-synthesized data and boundary-aware reranking to improve candidate retrieval performance.
Contribution
It presents a novel domain-adapted semantic reranking approach using LLM-generated supervision and a lightweight reranking head, eliminating the need for large-scale manual labels.
Findings
Recall@50 improved from 68.89% to 77.55%.
Precision@10 increased from 35.77% to 39.62%.
Recall@200 on global pool reached 0.7047.
Abstract
Candidate sourcing for recruiters is best viewed as a two-stage retrieval and reranking pipeline with recall as the primary objective under a limited review budget. An upstream production retriever first returns a candidate shortlist for each job description (JD), and our goal is to rerank that shortlist so that qualified candidates appear as high as possible. We present mira-embeddings-v1, a semantic reranking system for the recruitment domain that reshapes the embedding space with LLM-synthesized training data and corrects boundary confusions with a lightweight reranking head. Starting from real JDs, we build a five-stage prompt pipeline to generate diverse positive and hard negative samples that sculpt the semantic space from multiple angles. We then apply a two-round LoRA adaptation: JD--JD contrastive training followed by JD--CV triplet alignment on a heterogeneous text dataset.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
