Dynamic Noise Preference Optimization: Self-Improvement of Large Language Models with Self-Synthetic Data

Haoyan Yang; Khiem Le; Ting Hua; Shangqian Gao; Binfeng Xu; Zheng Tang; Jie Xu; Nitesh V. Chawla; Hongxia Jin; Vijay Srinivasan

arXiv:2502.05400·cs.CL·March 17, 2026

Dynamic Noise Preference Optimization: Self-Improvement of Large Language Models with Self-Synthetic Data

Haoyan Yang, Khiem Le, Ting Hua, Shangqian Gao, Binfeng Xu, Zheng Tang, Jie Xu, Nitesh V. Chawla, Hongxia Jin, Vijay Srinivasan

PDF

Open Access

TL;DR

This paper introduces Dynamic Noise Preference Optimization (DNPO), a novel method for improving large language models through self-synthetic data, dynamic sample labeling, and controlled noise injection, leading to continuous performance gains.

Contribution

The paper proposes DNPO, a new approach combining dynamic preference pair construction and trainable noise injection to enhance LLM self-improvement with synthetic data.

Findings

01

DNPO outperforms existing methods on multiple benchmarks.

02

Significant 29.4% win-loss rate gap in data quality with Zephyr-7B.

03

Enables continuous improvement without performance stagnation.

Abstract

Although LLMs have achieved significant success, their reliance on large volumes of human-annotated data has limited their potential for further scaling. In this situation, utilizing self-generated synthetic data has become crucial for fine-tuning LLMs without extensive human annotation. However, current methods often fail to ensure consistent improvements across iterations, with performance stagnating after only minimal updates. To overcome these challenges, we introduce Dynamic Noise Preference Optimization (DNPO), which combines dynamic sample labeling for constructing preference pairs with controlled, trainable noise injection during preference optimization. Our approach effectively prevents stagnation and enables continuous improvement. In experiments with Llama-3.2-3B and Zephyr-7B, DNPO consistently outperforms existing methods across multiple benchmarks. Additionally, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsManufacturing Process and Optimization · Advanced Manufacturing and Logistics Optimization

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Absolute Position Encodings · Layer Normalization · Label Smoothing · Residual Connection · Adam · Softmax