Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models

Lit Sin Tan; Junzhe Chen; Xiaolong Fu; Lichen Ma; Junshi Huang; Jianzhong Shi; Yan Li; Lijie Wen

arXiv:2603.15724·cs.LG·March 18, 2026

Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models

Lit Sin Tan, Junzhe Chen, Xiaolong Fu, Lichen Ma, Junshi Huang, Jianzhong Shi, Yan Li, Lijie Wen

PDF

Open Access

TL;DR

Meta-TTRL introduces a metacognitive test-time reinforcement learning framework that enables unified multimodal models to self-improve during inference, leading to significant performance gains across various tasks and models.

Contribution

It presents the first comprehensive framework for test-time reinforcement learning in unified multimodal models, leveraging model-intrinsic signals for self-optimization.

Findings

01

Meta-TTRL improves performance on compositional reasoning tasks.

02

It achieves significant gains on multiple T2I benchmarks.

03

The framework generalizes across different UMM architectures.

Abstract

Existing test-time scaling (TTS) methods for unified multimodal models (UMMs) in text-to-image (T2I) generation primarily rely on search or sampling strategies that produce only instance-level improvements, limiting the ability to learn from prior inferences and accumulate knowledge across similar prompts. To overcome these limitations, we propose Meta-TTRL, a metacognitive test-time reinforcement learning framework. Meta-TTRL performs test-time parameter optimization guided by model-intrinsic monitoring signals derived from the meta-knowledge of UMMs, achieving self-improvement and capability-level improvement at test time. Extensive experiments demonstrate that Meta-TTRL generalizes well across three representative UMMs, including Janus-Pro-7B, BAGEL, and Qwen-Image, achieving significant gains on compositional reasoning tasks and multiple T2I benchmarks with limited data. We provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Topic Modeling