RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie,, Christopher D. Manning

TL;DR
RAPTOR introduces a recursive tree-based summarization and retrieval method that enhances language models' ability to understand and reason over long documents, achieving state-of-the-art results in complex question-answering tasks.
Contribution
The paper presents a novel recursive embedding and summarization technique that constructs a hierarchical tree for improved document retrieval and understanding in language models.
Findings
Significant performance improvements on multiple tasks.
20% absolute accuracy gain on QuALITY benchmark.
Enhanced multi-step reasoning capabilities.
Abstract
Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR…
Peer Reviews
Decision·ICLR 2024 poster
1. Strong results compared to multiple baselines, but some choice of baselines are poorly justified and claims of SOTA are not correct. 2. An interesting approach to generate a set of summaries to retrieve from. The nature of the summaries is probably the main value here, especially given that the tree structure of the summaries are mostly ignored. Further, despite other complains for easy scalability, probably this approach will not immediately scale for larger retrieval datasets. Perhaps the
1. There is very little analysis of the summarized outputs. The analysis included in the main text, Table 6, is difficult to understand and does not reveal much about the content of the summaries. Based on an in-line example, it seems the benefit of this approach may be the abstractive nature of the summaries, and it would be helpful to verify further whether this is the case or some other property is helping. 2a. The baselines in table 1 and 2 are not well justified. It seems like BM25 and DPR
- The proposed method RAPTOR can indirectly retrieve long texts by tracking tree-structured automatically created clusters. - RAPTOR can decide the number of clusters automatically, and thus, it doesn't require manual tuning for creating the tree-structured clusters. - RAPTOR can achieve better performances in QA compared with commonly used retrievers.
- Considering the information loss by summarization, the benefit of RAPTOR against enumerating possible concatenation of chunks is uncertain. - Even if clustering is automatically done, how to segment texts into chunks is still left as a problem. - When comparing model performances, the parameter sizes of models should be the same or similar. However, the paper compares models with totally different parameters. This is problematic from the viewpoint of fairness. - How many retrieved instances ar
1. Overall an elegant, well-motivated, and seemingly novel (to my knowledge) approach. It can set up a new paradigm baseline for RAG that can incite future research on refining the general idea. 2. Decent performance compared to other paradigmatic approaches for retrieval such as DPR.
1. One limitation seems to be that the approach requires recursive summarization with an LLM which can add to computational expense (could be also good to share the trade-off). 2. While I get the theoretical intuition of clustering (to prevent information loss, by clustering more homogenous content for summarization), it would have been nice to have an empirical demonstration of the effectiveness of clustering. A possible ablation could be: what if we took a balanced tree-style encoding/summar
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Data Management and Algorithms · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Residual Connection · Dropout · Layer Normalization · Multi-Head Attention · Adam · Softmax · Dense Connections
