The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation

Samee Arif; Sualeha Farid; Abdul Hameed Azeemi; Awais Athar; Agha Ali Raza

arXiv:2408.08688·cs.CL·August 18, 2025

The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation

Samee Arif, Sualeha Farid, Abdul Hameed Azeemi, Awais Athar, Agha Ali Raza

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-model workflow for synthetic Preference Optimization dataset generation using LLMs for response evaluation and generation, demonstrating improved consistency and performance over single-model approaches.

Contribution

It proposes a novel multi-model pipeline combining LLMs for automated response evaluation and generation, optimizing dataset creation for preference modeling.

Findings

01

GPT-4o-as-a-Judge is most consistent for evaluation.

02

LLM Feedback Loop with Llama and Gemma outperforms single models.

03

Generated datasets show high quality and improved evaluation metrics.

Abstract

This paper presents a novel methodology for generating synthetic Preference Optimization (PO) datasets using multi-model workflows. We evaluate the effectiveness and potential of these workflows in automating and enhancing the dataset generation process. PO dataset generation requires two modules: (1) $response evaluation$ , and (2) $response generation$ . In the $response evaluation$ module, the responses from Large Language Models (LLMs) are evaluated and ranked - a task typically carried out by human annotators that we automate using LLMs. We assess the response evaluation module in a 2 step process. In step 1, we assess LLMs as evaluators using three distinct prompting strategies. In step 2, we apply the winning prompting strategy to compare the performance of LLM-as-a-Judge, LLMs-as-a-Jury, and LLM Debate. Our evaluation shows that GPT-4o-as-a-Judge is more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ulrs0/MA-PO
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Parrot optimizer: Algorithm and applications to medical problems · Softmax · Linear Layer · Attention Dropout · Dropout · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning