Small Reward Models via Backward Inference
Yike Wang, Faeze Brahman, Shangbin Feng, Teng Xiao, Hannaneh Hajishirzi, Yulia Tsvetkov

TL;DR
This paper introduces FLIP, a novel reward modeling method that uses backward inference to evaluate responses without reference responses or rubrics, outperforming traditional methods especially in smaller models.
Contribution
FLIP reformulates reward modeling through backward inference, enabling reference-free, rubric-free evaluation that improves performance and robustness in small language models.
Findings
FLIP outperforms LLM-as-a-Judge baselines by 79.6% on average.
FLIP enhances downstream task performance with test-time scaling.
FLIP is effective for longer outputs and resistant to reward hacking.
Abstract
Reward models (RMs) play a central role throughout the language model (LM) pipeline, particularly in non-verifiable domains. However, the dominant LLM-as-a-Judge paradigm relies on the strong reasoning capabilities of large models, while alternative approaches require reference responses or explicit rubrics, limiting flexibility and broader accessibility. In this work, we propose FLIP (FLipped Inference for Prompt reconstruction), a reference-free and rubric-free reward modeling approach that reformulates reward modeling through backward inference: inferring the instruction that would most plausibly produce a given response. The similarity between the inferred and the original instructions is then used as the reward signal. Evaluations across four domains using 13 small language models show that FLIP outperforms LLM-as-a-Judge baselines by an average of 79.6%. Moreover, FLIP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
