ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback
Matthew Penaroza

TL;DR
This paper introduces reasoning and retrieval graphs to enhance language model reasoning, leading to significant accuracy improvements and efficiency gains through evidence-centric feedback and self-improvement loops.
Contribution
It presents a novel self-improving framework called ROZA graphs that persist evidence chains and prune candidates, improving accuracy and efficiency without retraining the base model.
Findings
Accuracy improves monotonically with evidence-profile coverage, reaching +10.6pp at 50%+ coverage.
4-hop reasoning accuracy improves by +11.0pp.
High-reuse scenarios achieve similar accuracy with 46% lower cost and latency.
Abstract
Language model agents reason from scratch on every query, discarding their chain of thought after each run. The result is lower accuracy and high run-to-run variance. We introduce reasoning graphs, which persist the per-evidence chain of thought as structured edges. Unlike prior memory that retrieves distilled strategies by query similarity, reasoning graphs enable evidence-centric feedback: for every candidate item, the system traverses all incoming evaluation edges across prior runs to surface how that specific item has been judged before. We further introduce retrieval graphs, which feed a planner that prunes consistently-rejected candidates over successive runs. Together they form a ROZA graph: a self-improving feedback loop in which accuracy gains scale with gold-passage reuse (reasoning graph) and efficiency gains scale with candidate-pool overlap (retrieval graph). The base model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
