STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
Shirley Wu, Shiyu Zhao, Michihiro Yasunaga, Kexin Huang, Kaidi Cao,, Qian Huang, Vassilis N. Ioannidis, Karthik Subbian, James Zou, Jure Leskovec

TL;DR
STaRK is a large-scale benchmark designed to evaluate retrieval systems on semi-structured knowledge bases across multiple domains, highlighting current limitations of LLM-driven retrieval methods.
Contribution
We introduce STaRK, a comprehensive benchmark with synthesized and human-generated queries for semi-structured retrieval, addressing a gap in existing evaluation resources.
Findings
Current retrieval systems struggle with STaRK's complex queries.
LLMs show significant challenges in semi-structured retrieval tasks.
The benchmark reveals the need for more advanced semi-structured retrieval methods.
Abstract
Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, many previous works studied textual and relational retrieval tasks as separate topics. To address the gap, we develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Relational Knowledge Bases. Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine. We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties, together with their ground-truth answers (items). We conduct rigorous human evaluation to validate the quality of our synthesized queries.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Library Science and Information Systems
