Endogenous Reprompting: Self-Evolving Cognitive Alignment for Unified Multimodal Models

Zhenchen Tang; Songlin Yang; Zichuan Wang; Bo Peng; Yang Li; Beibei Dong; Jing Dong

arXiv:2601.20305·cs.AI·January 29, 2026

Endogenous Reprompting: Self-Evolving Cognitive Alignment for Unified Multimodal Models

Zhenchen Tang, Songlin Yang, Zichuan Wang, Bo Peng, Yang Li, Beibei Dong, Jing Dong

PDF

Open Access

TL;DR

This paper introduces Endogenous Reprompting and SEER, a novel framework enabling multimodal models to self-evaluate and improve their generation process, significantly enhancing accuracy and quality with minimal data.

Contribution

It presents a new endogenous reprompting mechanism and a training framework that uses reinforcement learning to improve multimodal model self-evaluation and generation.

Findings

01

SEER outperforms state-of-the-art baselines in accuracy.

02

SEER improves reprompting efficiency and generation quality.

03

The approach requires only 300 samples for training.

Abstract

Unified Multimodal Models (UMMs) exhibit strong understanding, yet this capability often fails to effectively guide generation. We identify this as a Cognitive Gap: the model lacks the understanding of how to enhance its own generation process. To bridge this gap, we propose Endogenous Reprompting, a mechanism that transforms the model's understanding from a passive encoding process into an explicit generative reasoning step by generating self-aligned descriptors during generation. To achieve this, we introduce SEER (Self-Evolving Evaluator and Reprompter), a training framework that establishes a two-stage endogenous loop using only 300 samples from a compact proxy task, Visual Instruction Elaboration. First, Reinforcement Learning with Verifiable Rewards (RLVR) activates the model's latent evaluation ability via curriculum learning, producing a high-fidelity endogenous reward signal.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning