RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

Joel Rorseth; Parke Godfrey; Lukasz Golab; Divesh Srivastava; Jarek Szlichta

arXiv:2605.10862·cs.CL·May 12, 2026

RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems

Joel Rorseth, Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jarek Szlichta

PDF

TL;DR

RUBEN is an interactive tool that uses rule-based explanations to interpret retrieval-augmented LLM outputs, aiding in safety testing and understanding model behavior.

Contribution

It introduces a novel pruning approach for efficiently discovering minimal rule sets explaining LLM outputs in data-driven tasks.

Findings

01

Efficient identification of minimal rule sets explaining LLM outputs.

02

Application of rules to test LLM safety and robustness.

03

Demonstration of rules' effectiveness against adversarial prompts.

Abstract

This paper demonstrates RUBEN, an interactive tool for discovering minimal rules to explain the outputs of retrieval-augmented large language models (LLMs) in data-driven applications. We leverage novel pruning strategies to efficiently identify a minimal set of rules that subsume all others. We further demonstrate novel applications of these rules for LLM safety, specifically to test the resiliency of safety training and effectiveness of adversarial prompt injections.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.