PHUDGE: Phi-3 as Scalable Judge
Mahesh Deshwal, Apoorva Chawla

TL;DR
PHUDGE is a fine-tuned Phi3 model that achieves state-of-the-art results in multiple tasks with improved efficiency, demonstrating the effectiveness of simplified modeling and innovative loss functions over larger models.
Contribution
The paper introduces PHUDGE, a scaled Phi3 model that outperforms existing models in latency and throughput, and proposes novel training techniques including the use of Wasserstein distance as a loss function.
Findings
PHUDGE surpasses all existing models in latency and throughput.
The model shows strong correlation with GPT-4 and human annotators.
Using Wasserstein distance improves training stability and results.
Abstract
In this paper cum technical report, we present PHUDGE A fine tuned Phi3 model that achieved SOTA results in 4 tasks as Feedback Test, Feedback OOD, MT Human, Preference Test surpassing each and every existing model in latency and throughput. It shows very strong correlation not only with GPT4 but with Human annotators too in unseen data as well as in both absolute and relative grading tasks. We have not only addressed the usage of small LMs for cost effective production grade systems but have also shown that Causal modelling is not only slow in nature but sometimes it can hinder models learning capabilities and should be replaced by simpler tasks whenever we can to make the overall system faster and better. We show that by following systematic ML experimentation, thoughtful data augmentation and re purposing the problem itself, we can even beat 10x bigger models even with lesser…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
