Structured RAG for Answering Aggregative Questions
Omri Koshorek, Niv Granot, Aviv Alloni, Shahar Admati, Roee Hendel, Ido Weiss, Alan Arazi, Shay-Nitzan Cohen, Yonatan Belinkov

TL;DR
This paper introduces S-RAG, a novel retrieval-augmented generation method tailored for aggregative questions, which constructs structured corpus representations and translates natural queries into formal ones, outperforming existing systems.
Contribution
The paper presents S-RAG, a new approach for aggregative question answering that builds structured representations and formal query translation, along with two new datasets for evaluation.
Findings
S-RAG outperforms existing RAG systems on new aggregative datasets.
S-RAG surpasses long-context LLMs in answering aggregative questions.
Introduction of HOTELS and WORLD CUP datasets for aggregative query evaluation.
Abstract
Retrieval-Augmented Generation (RAG) has become the dominant approach for answering questions over large corpora. However, current datasets and methods are highly focused on cases where only a small part of the corpus (usually a few paragraphs) is relevant per query, and fail to capture the rich world of aggregative queries. These require gathering information from a large set of documents and reasoning over them. To address this gap, we propose S-RAG, an approach specifically designed for such queries. At ingestion time, S-RAG constructs a structured representation of the corpus; at inference time, it translates natural-language queries into formal queries over said representation. To validate our approach and promote further research in this area, we introduce two new datasets of aggregative queries: HOTELS and WORLD CUP. Experiments with S-RAG on the newly introduced datasets, as…
Peer Reviews
Decision·Submitted to ICLR 2026
Clear problem definition: The paper clearly identifies the limitations of current RAG systems in handling “aggregative queries,” an important real-world scenario. Strong methodological innovation: S-RAG transforms unstructured documents into a structured database and leverages formal queries for reasoning—a novel and practical approach. Valuable dataset contribution: The two proposed datasets fill an existing gap in RAG datasets regarding aggregative queries, providing valuable research resour
Strong methodological assumptions: S-RAG assumes that all documents share the same structure (i.e., a single entity type), which may not hold in real-world multi-entity document scenarios. Limited generalization: The method’s performance on complex structures (e.g., nested attributes or list-type data) remains unverified, restricting its applicability to more complex corpora. Small dataset scale: The Hotels and World Cup datasets are relatively small, and the method’s scalability in large-scal
1. This paper focuses on a practical question, aggregative queries, which are frequently encountered in daily life. General RAG methods struggle with this type of query. This study presents a timely approach to address this limitation. 2. The paper's presentation is clear and easy to follow. 3. The datasets are publicly available, which ensure reproducibility.
1. The *Hotels* dataset is created by prompting the LLM with hand-crafted attributes, so it is well structured to some degree and thus too easy for LLM to guess the attributes. This is different from what the author claims that this work is for unstructured corpus. 2. The synthetic question-answer pairs haven't been verified by human, raising the concerns about the data quality. 3. The proposed method is too straightforward, and there are existing studies with similar methods: Zhang, Wen, et a
S1) The paper identifies aggregative queries as a distinct and practically important class of QA problems inadequately handled by existing RAG and multi-hop QA systems. Framing this as a structured reasoning challenge is conceptually fresh and well-motivated. S2) The S-RAG architecture is clearly described, with distinct ingestion and inference phases. The schema-induction process and record standardization pipeline are intuitively presented and easy to follow. S3) The two new datasets fill a
O1) While the paper introduces a creative conceptual shift, the technical implementation largely relies on prompting existing LLMs for schema induction, record extraction, and SQL generation. There is minimal methodological innovation beyond careful prompt design. O2) The entire pipeline hinges on the reliability of LLM-generated schemas and records. These are prone to variability, omission, and inconsistency (as the authors themselves note). Without quantitative analysis of schema accuracy or
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Multimodal Machine Learning Applications
