RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems
Joel Rorseth, Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jarek Szlichta

TL;DR
RUBEN is an interactive tool that uses rule-based explanations to interpret retrieval-augmented LLM outputs, aiding in safety testing and understanding model behavior.
Contribution
It introduces a novel pruning approach for efficiently discovering minimal rule sets explaining LLM outputs in data-driven tasks.
Findings
Efficient identification of minimal rule sets explaining LLM outputs.
Application of rules to test LLM safety and robustness.
Demonstration of rules' effectiveness against adversarial prompts.
Abstract
This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in data-driven applications. We leverage novel pruning strategies to efficiently identify a minimal set of rules that subsume all others. We further demonstrate novel applications of these rules for LLM safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
