RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning

Andrea Morandi

arXiv:2605.13695·cs.CL·May 14, 2026

RTLC -- Research, Teach-to-Learn, Critique: A three-stage prompting paradigm inspired by the Feynman Learning Technique that lifts LLM-as-judge accuracy on JudgeBench with no fine-tuning

Andrea Morandi

PDF

TL;DR

RTLC introduces a three-stage prompting paradigm inspired by the Feynman Learning Technique that significantly improves LLM-based judge accuracy on JudgeBench without fine-tuning.

Contribution

It presents a novel three-stage prompting recipe that enhances LLM judgment accuracy through structured reasoning, critique, and ensemble methods without additional training or external tools.

Findings

01

RTLC improves Claude 3.7 Sonnet's accuracy from 64.6% to 78.6% on JudgeBench.

02

RTLC outperforms self-consistency voting and single-shot prompts.

03

Ablation shows Teach-to-Learn scaffold adds 9.4 percentage points to accuracy.

Abstract

LLM-as-a-judge is now the default measurement instrument for open-ended generation, but on the public JudgeBench benchmark even strong instruction-tuned judges barely scrape past random on objective-correctness pairwise items. We introduce RTLC, a three-stage prompting recipe -- Research, Teach-to-Learn, Critique -- that promotes a single black-box LLM into an ensemble-of-thought judge with no fine-tuning, retrieval, or external tools. Stage 1 wraps the input in a fixed pedagogical scaffold porting the Feynman Learning Technique (study $\to$ teach $\to$ find gaps $\to$ simplify) into LLM prompting. Stage 2 draws N=10 independent candidate verdicts at temperature 0.4. Stage 3 acts as its own critic, cross-comparing the candidate set against the original question to emit one critiqued verdict at temperature 0. On JudgeBench-GPT (350 hard pairwise items), Claude 3.7 Sonnet's pairwise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.