Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation
Xuhong He, To Eun Kim, Maik Fr\"obe, Jaime Arguello, Bhaskar Mitra, Fernando Diaz

TL;DR
This paper introduces a large-scale multilingual Tip-of-the-Tongue benchmark, constructed using an LLM-based framework, to evaluate retrieval systems across Chinese, Japanese, Korean, and English.
Contribution
It presents the first multilingual ToT test collections with 5,000 queries per language, and analyzes language-aware design choices for synthetic query generation.
Findings
Effective ToT simulation depends on language-aware prompt design.
Non-English sources are crucial for realistic query generation.
English Wikipedia can enhance synthetic queries when non-English sources lack information.
Abstract
Tip-of-the-Tongue (ToT) retrieval benchmarks have largely focused on English, limiting their applicability to multilingual information access. In this work, we construct multilingual ToT test collections for Chinese, Japanese, Korean, and English, using an LLM-based query simulation framework. We systematically study how prompt language and source document language affect the fidelity of simulated ToT queries, validating synthetic queries through system rank correlation against real user queries. Our results show that effective ToT simulation requires language-aware design choices: non-English language sources are generally important, while English Wikipedia can be beneficial when non-English sources provide insufficient information for query generation. Based on these findings, we release four ToT test collections with 5,000 queries per language across multiple domains. This work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
