POCA: Pareto-Optimal Curriculum Alignment for Visual Text Generation

Yaohou Fan; Qingzhong Wang; Yongsong Huang; Junyi Liu; Tomo Miyazaki; Shinichiro Omachi

arXiv:2604.24171·cs.CV·April 28, 2026

POCA: Pareto-Optimal Curriculum Alignment for Visual Text Generation

Yaohou Fan, Qingzhong Wang, Yongsong Huang, Junyi Liu, Tomo Miyazaki, Shinichiro Omachi

PDF

TL;DR

POCA is a multi-objective framework that optimizes visual text generation by finding Pareto-optimal solutions and adaptively managing training curriculum for better trade-offs between accuracy and coherence.

Contribution

It introduces Pareto-Optimal Curriculum Alignment (POCA), a novel method that combines Pareto optimization with adaptive curriculum learning for improved visual text generation.

Findings

01

POCA significantly improves CLIP, HPS scores, and sentence accuracy.

02

It effectively balances multiple rewards without scalarization.

03

POCA enhances convergence in multi-reward training environments.

Abstract

Current visual text generation models struggle with the trade-off between text accuracy and overall image coherence. We find that achieving high text accuracy can reduce aesthetic quality and instruction-following capability. Although reinforcement learning approaches can alleviate the problem through aligning with multiple rewards, they are often unstable for text generation, as existing approaches normally optimize multiple rewards in a weighted-sum way. In addition, it is difficult to balance the weight of each reward. Moreover, reinforcement learning requires a set of training instructions. A large number of prompts require more training time and computing resources, while a small set leads to poor performance. Hence, how to select the prompts for efficient training is an unsolved problem. In this study, we propose Pareto-Optimal Curriculum Alignment (POCA), a framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.