The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation
Samee Arif, Sualeha Farid, Abdul Hameed Azeemi, Awais Athar, Agha Ali Raza

TL;DR
This paper introduces a multi-model workflow for synthetic Preference Optimization dataset generation using LLMs for response evaluation and generation, demonstrating improved consistency and performance over single-model approaches.
Contribution
It proposes a novel multi-model pipeline combining LLMs for automated response evaluation and generation, optimizing dataset creation for preference modeling.
Findings
GPT-4o-as-a-Judge is most consistent for evaluation.
LLM Feedback Loop with Llama and Gemma outperforms single models.
Generated datasets show high quality and improved evaluation metrics.
Abstract
This paper presents a novel methodology for generating synthetic Preference Optimization (PO) datasets using multi-model workflows. We evaluate the effectiveness and potential of these workflows in automating and enhancing the dataset generation process. PO dataset generation requires two modules: (1) , and (2) . In the module, the responses from Large Language Models (LLMs) are evaluated and ranked - a task typically carried out by human annotators that we automate using LLMs. We assess the response evaluation module in a 2 step process. In step 1, we assess LLMs as evaluators using three distinct prompting strategies. In step 2, we apply the winning prompting strategy to compare the performance of LLM-as-a-Judge, LLMs-as-a-Jury, and LLM Debate. Our evaluation shows that GPT-4o-as-a-Judge is more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Parrot optimizer: Algorithm and applications to medical problems · Softmax · Linear Layer · Attention Dropout · Dropout · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning
