POCA: Pareto-Optimal Curriculum Alignment for Visual Text Generation
Yaohou Fan, Qingzhong Wang, Yongsong Huang, Junyi Liu, Tomo Miyazaki, Shinichiro Omachi

TL;DR
POCA is a multi-objective framework that optimizes visual text generation by finding Pareto-optimal solutions and adaptively managing training curriculum for better trade-offs between accuracy and coherence.
Contribution
It introduces Pareto-Optimal Curriculum Alignment (POCA), a novel method that combines Pareto optimization with adaptive curriculum learning for improved visual text generation.
Findings
POCA significantly improves CLIP, HPS scores, and sentence accuracy.
It effectively balances multiple rewards without scalarization.
POCA enhances convergence in multi-reward training environments.
Abstract
Current visual text generation models struggle with the trade-off between text accuracy and overall image coherence. We find that achieving high text accuracy can reduce aesthetic quality and instruction-following capability. Although reinforcement learning approaches can alleviate the problem through aligning with multiple rewards, they are often unstable for text generation, as existing approaches normally optimize multiple rewards in a weighted-sum way. In addition, it is difficult to balance the weight of each reward. Moreover, reinforcement learning requires a set of training instructions. A large number of prompts require more training time and computing resources, while a small set leads to poor performance. Hence, how to select the prompts for efficient training is an unsolved problem. In this study, we propose Pareto-Optimal Curriculum Alignment (POCA), a framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
