Tiny-Critic RAG: Empowering Agentic Fallback with Parameter-Efficient Small Language Models
Yichao Wu, Penghao Liang, Yafei Xiang, Mengwei Yuan, Jianan Liu, Jing Yang, Xianyou Li, Weiran Yan

TL;DR
Tiny-Critic RAG introduces a cost-effective, low-latency approach for agentic fallback in retrieval-augmented generation by using a parameter-efficient small language model as a deterministic gatekeeper, reducing reliance on large models.
Contribution
The paper presents Tiny-Critic RAG, a novel framework that employs a small language model with LoRA for efficient evaluation and routing, replacing large models in agentic RAG systems.
Findings
Achieves routing accuracy comparable to GPT-4o-mini.
Reduces latency by an order of magnitude.
Demonstrates high cost-effectiveness for agent deployment.
Abstract
Retrieval-Augmented Generation (RAG) grounds Large Language Models (LLMs) to mitigate factual hallucinations. Recent paradigms shift from static pipelines to Modular and Agentic RAG frameworks, granting models autonomy for multi-hop reasoning or self-correction. However, current reflective RAG heavily relies on massive LLMs as universal evaluators. In high-throughput systems, executing complete forward passes for billion-parameter models merely for binary routing introduces severe computational redundancy. Furthermore, in autonomous agent scenarios, inaccurate retrieval causes models to expend excessive tokens on spurious reasoning and redundant tool calls, inflating Time-to-First-Token (TTFT) and costs. We propose Tiny-Critic RAG, decoupling evaluation by deploying a parameter-efficient Small Language Model (SLM) via Low-Rank Adaptation (LoRA). Acting as a deterministic gatekeeper,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
