Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document Splitting Methods
Esmaeil Narimissa, David Raithel

TL;DR
This paper investigates different document splitting methods and introduces a novel evaluation technique for Retrieval-Augmented Generation systems, emphasizing the importance of document characteristics and proposing metrics for improved retrieval accuracy.
Contribution
It presents a new evaluation method using an open-source model and compares document splitting techniques, highlighting the Recursive Character Splitter's superiority in maintaining context.
Findings
Recursive Character Splitter outperforms Token-based Splitter
New evaluation dataset generated with an open-source model
Weighted metrics improve assessment of retrieval relevance
Abstract
The performance of Retrieval-Augmented Generation (RAG) systems in information retrieval is significantly influenced by the characteristics of the documents being processed. In this study, the structured nature of textbooks, the conciseness of articles, and the narrative complexity of novels are shown to require distinct retrieval strategies. A comparative evaluation of multiple document-splitting methods reveals that the Recursive Character Splitter outperforms the Token-based Splitter in preserving contextual integrity. A novel evaluation technique is introduced, utilizing an open-source model to generate a comprehensive dataset of question-and-answer pairs, simulating realistic retrieval scenarios to enhance testing efficiency and metric reliability. The evaluation employs weighted scoring metrics, including SequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Softmax · Layer Normalization · WordPiece · Dropout · Attention Dropout · BART · Dense Connections
