Towards Better Search with Domain-Aware Text Embeddings for C2C Marketplaces
Andre Rusli, Miao Cao, Shoma Ishimoto, Sho Akiyama, Max Frenzel

TL;DR
This paper presents a domain-aware Japanese text-embedding approach for C2C marketplaces, improving search relevance and efficiency through specialized fine-tuning and compact embedding techniques, validated by offline and online experiments.
Contribution
Introduces a novel domain-aware Japanese text-embedding method with role-specific modeling and Matryoshka truncation for improved C2C marketplace search.
Findings
Significant offline performance gains over generic encoders.
Enhanced handling of proper nouns and marketplace-specific terms.
Online A/B tests show increased revenue and search efficiency.
Abstract
Consumer-to-consumer (C2C) marketplaces pose distinct retrieval challenges: short, ambiguous queries; noisy, user-generated listings; and strict production constraints. This paper reports our experiment to build a domain-aware Japanese text-embedding approach to improve the quality of search at Mercari, Japan's largest C2C marketplace. We experimented with fine-tuning on purchase-driven query-title pairs, using role-specific prefixes to model query-item asymmetry. To meet production constraints, we apply Matryoshka Representation Learning to obtain compact, truncation-robust embeddings. Offline evaluation on historical search logs shows consistent gains over a strong generic encoder, with particularly large improvements when replacing PCA compression with Matryoshka truncation. A manual assessment further highlights better handling of proper nouns, marketplace-specific semantics, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Text and Document Classification Technologies
