UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation

Jiyu Guo; Shuo Yang; Yiming Huang; Yancheng Long; Xiaobo Xia; Xiu Su; Bo Zhao; Zeke Xie; Liqiang Nie

arXiv:2510.24262·cs.CV·October 29, 2025

UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation

Jiyu Guo, Shuo Yang, Yiming Huang, Yancheng Long, Xiaobo Xia, Xiu Su, Bo Zhao, Zeke Xie, Liqiang Nie

PDF

TL;DR

UtilGen introduces a utility-centric data augmentation framework that adaptively generates task-specific synthetic data by leveraging downstream task feedback, significantly improving model performance across multiple benchmarks.

Contribution

It proposes a novel dual-level optimization strategy for generative data augmentation that focuses on maximizing task utility rather than just visual quality.

Findings

01

Achieves an average accuracy improvement of 3.87% over SOTA methods.

02

Produces more impactful, task-relevant synthetic data.

03

Demonstrates effectiveness across eight benchmark datasets.

Abstract

Data augmentation using generative models has emerged as a powerful paradigm for enhancing performance in computer vision tasks. However, most existing augmentation approaches primarily focus on optimizing intrinsic data attributes -- such as fidelity and diversity -- to generate visually high-quality synthetic data, while often neglecting task-specific requirements. Yet, it is essential for data generators to account for the needs of downstream tasks, as training data requirements can vary significantly across different tasks and network architectures. To address these limitations, we propose UtilGen, a novel utility-centric data augmentation framework that adaptively optimizes the data generation process to produce task-specific, high-utility training data via downstream task feedback. Specifically, we first introduce a weight allocation network to evaluate the task-specific utility…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.