Retrieval is Accurate Generation
Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou,, Shuming Shi

TL;DR
This paper introduces a retrieval-based text generation method that selects context-aware phrases from documents, significantly improving accuracy and quality over standard language models in knowledge-intensive and open-ended tasks.
Contribution
The paper proposes a novel retrieval-augmented generation approach with a self-reinforcing training oracle initialization, advancing beyond traditional token-based models.
Findings
Outperforms standard models on OpenbookQA with 36.27% accuracy
Achieves higher MAUVE score of 81.58% in open-ended generation
Attains best performance and lowest latency among retrieval baselines
Abstract
Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retrieved from numerous possible documents. To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement. Extensive experiments show that our model not only outperforms standard language models on a variety of knowledge-intensive tasks but also demonstrates improved generation quality in open-ended text generation. For instance, compared to the standard language model counterpart, our…
Peer Reviews
Decision·ICLR 2024 poster
The proposed approach seems very interesting and will be useful to the generation community. As I mentioned earlier, the linguistically inspired approach could be very useful in providing meaningful attributions to their sources. Strong results on a variety of benchmarks from Open book qa and open ended generation tasks. == Most of my concerns were adequately addressed in the authors rebuttal. Please include these details in the camera ready version if accepted. I have update my reviews acco
The authors proposed a very interesting approach but I felt a lot of important details are missing. Please see my questions/comments below. It is unclear whether or not the code will be released from this work. Another weakness of the work I believe is that this approach will not be robust to languages or domains where our syntactic parsing capabilities are limited.
- The paper is well written and easy to follow. - Their approach on text generation and selecting phrases is novel and introduces an interesting approach to text generation. - The authors study the effectiveness of their approach well and provide comparisons with other approaches. - Their zero-shot results on knowledge intensive tasks is convincing of the effectiveness of their approach.
- Lack of any human evaluations: Although there are automatic metrics for text generation, there still a need to have humans judge the generation. - The paper does not provide deep insights into the observed results. For example, Section 6, Main Results, related to Table 4, it is not clear why the MAUVE score has such a huge jump for their method, or why finetuning the base model drops this score by a lot.
- A novel approach for retrieval augmented generation - Holistic evaluation not only by measuring the fluency in open-ended text generation but also by carrying out comprehensive evaluation in a wide range of knowledge-intensive tasks, such as open-domain question answering. - Plug-and-play feature of the phrase index, as a way of adapting to out-of-domain distributions (such as the Medical domain) by simply changing/extending the phrase index with a domain-specific index without any further tr
Weaknesses: - More implementation details regarding the size of the phrase index, etc would be good to have in the paper. - The work might also benefit from some discussion regarding scalability of the phrase index Minor suggestions: - As Figure 1 is the main overview of the approach proposed in the paper, a more detailed footnote would be appreciated. - Section 6 "Results" wouldn't be better under subsection 5.2.2, as it reflects on results from the Open-Ended Text Generation experiments. - T
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Speech and dialogue systems · Machine Learning and Algorithms
