RMLer: Synthesizing Novel Objects across Diverse Categories via Reinforcement Mixing Learning
Jun Li, Zikun Chen, Haibo Chen, Shuo Chen, Jian Yang

TL;DR
RMLer introduces a reinforcement learning framework for synthesizing novel objects by effectively blending concepts from diverse categories in text-to-image generation, resulting in more coherent and high-quality outputs.
Contribution
The paper presents Reinforcement Mixing Learning (RMLer), a novel approach that formulates cross-category concept fusion as a reinforcement learning problem with dynamic blending strategies.
Findings
RMLer outperforms existing methods in generating coherent objects.
The framework achieves higher semantic similarity and compositional balance.
Experimental results demonstrate improved visual quality and diversity.
Abstract
Novel object synthesis by integrating distinct textual concepts from diverse categories remains a significant challenge in Text-to-Image (T2I) generation. Existing methods often suffer from insufficient concept mixing, lack of rigorous evaluation, and suboptimal outputs-manifesting as conceptual imbalance, superficial combinations, or mere juxtapositions. To address these limitations, we propose Reinforcement Mixing Learning (RMLer), a framework that formulates cross-category concept fusion as a reinforcement learning problem: mixed features serve as states, mixing strategies as actions, and visual outcomes as rewards. Specifically, we design an MLP-policy network to predict dynamic coefficients for blending cross-category text embeddings. We further introduce visual rewards based on (1) semantic similarity and (2) compositional balance between the fused object and its constituent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
