Luna-2: Scalable Single-Token Evaluation with Small Language Models

Vatsal Goel; Rishon Dsouza; Nikhil Ega; Amey Ramesh Rambatla; Rob Friel; Shuai Shao; Yash Sheth

arXiv:2602.18583·cs.CL·February 24, 2026

Luna-2: Scalable Single-Token Evaluation with Small Language Models

Vatsal Goel, Rishon Dsouza, Nikhil Ega, Amey Ramesh Rambatla, Rob Friel, Shuai Shao, Yash Sheth

PDF

Open Access

TL;DR

Luna-2 introduces a scalable, deterministic evaluation architecture using small language models with lightweight adapters, achieving high accuracy in task-specific metrics while significantly reducing cost and latency.

Contribution

The paper presents Luna-2, a novel architecture that enables fast, accurate, and cost-effective evaluation of language models using small models with specialized adapters.

Findings

01

Matches state-of-the-art accuracy in safety and hallucination benchmarks

02

Reduces inference cost by over 80 times

03

Decreases latency by over 20 times

Abstract

Real-time guardrails require evaluation that is accurate, cheap, and fast - yet today's default, LLM-as-a-judge (LLMAJ), is slow, expensive, and operationally non-deterministic due to multi-token generation. We present Luna-2, a novel architecture that leverages decoder-only small language models (SLMs) into a deterministic evaluation model to reliably compute complex task-specific LLMAJ metrics (e.g. toxicity, hallucination, tool selection quality, etc.) at an accuracy at par or higher than LLMAJ using frontier LLMs while drastically reducing the cost and latency of computation. Each metric is implemented as a lightweight LoRA/PEFT head on top of a shared SLM backbone, enabling hundreds of specialized metrics to run concurrently on a single GPU, deployable locally next to AI systems in a privacy-preserving and latency optimizing manner. Across content safety and hallucination…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Natural Language Processing Techniques