MARS: Margin-Aware Reward-Modeling with Self-Refinement
Payel Bhattacharjee, Osvaldo Simeone, Ravi Tandon

TL;DR
MARS introduces an adaptive augmentation method that focuses on uncertain preference pairs to improve reward model robustness, backed by theoretical guarantees and empirical gains.
Contribution
It proposes a novel margin-aware augmentation strategy that targets ambiguous data points, enhancing reward model training efficiency and robustness.
Findings
Consistent performance improvements over uniform augmentation.
Theoretical guarantees on increased loss function curvature.
Enhanced reward model robustness through targeted data augmentation.
Abstract
Reward modeling is a core component of modern alignment pipelines including RLHF and RLAIF, underpinning policy optimization methods including PPO and TRPO. However, training reliable reward models relies heavily on human-labeled preference data, which is costly and limited, motivating the use of data augmentation. Existing augmentation approaches typically operate at the representation or semantic level and remain agnostic to the reward model's estimation difficulty. In this paper, we propose MARS, an adaptive, margin-aware augmentation and sampling strategy that explicitly targets ambiguous and failure modes of the reward model. Our proposed framework, MARS, concentrates augmentation on low-margin (ambiguous) preference pairs where the reward model is most uncertain, and iteratively refines the training distribution via hard-sample augmentation. We provide theoretical guarantees…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
