Kinship Data Benchmark for Multi-hop Reasoning

Tianda Sun; Dimitar Kazakov

arXiv:2601.07794·cs.CL·January 13, 2026

Kinship Data Benchmark for Multi-hop Reasoning

Tianda Sun, Dimitar Kazakov

PDF

Open Access

TL;DR

KinshipQA is a new benchmark that evaluates large language models' ability to perform multi-hop reasoning over culturally specific genealogical data, revealing differences in reasoning skills across models and cultures.

Contribution

We introduce KinshipQA, a generative pipeline for creating large-scale, culture-specific genealogical data for multi-hop reasoning evaluation of LLMs.

Findings

01

Models show varied performance across cultural contexts.

02

KinshipQA exposes systematic reasoning differences among models.

03

Benchmark enables controlled variation of task difficulty and cultural assumptions.

Abstract

Large language models (LLMs) are increasingly evaluated on their ability to perform multi-hop reasoning, i.e., to combine multiple pieces of information into a coherent inference. We introduce KinshipQA, a benchmark designed to probe this capability through reasoning over kinship relations. The central contribution of our work is a generative pipeline that produces, on demand, large-scale, realistic, and culture-specific genealogical data: collections of interconnected family trees that satisfy explicit marriage constraints associated with different kinship systems. This allows task difficulty, cultural assumptions, and relational depth to be systematically controlled and varied. From these genealogies, we derive textual inference tasks that require reasoning over implicit relational chains. We evaluate the resulting benchmark using six state-of-the-art LLMs, spanning both open-source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Language and cultural evolution · Advanced Graph Neural Networks