TL;DR
This paper introduces Activation Replay, a training-free method to enhance reasoning in large multimodal models by manipulating low-entropy activations at test time, improving performance across various tasks.
Contribution
It proposes Activation Replay, a novel approach that boosts reasoning in post-trained LMMs by replaying low-entropy activations without expensive policy optimization.
Findings
Activation Replay improves reasoning in diverse scenarios.
It boosts Pass@K and reasoning coverage.
Replaying low-entropy activations outperforms other methods.
Abstract
Recently, Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective approach to incentivizing reasoning capability in Large Multimodal Models (LMMs), while the underlying mechanisms behind this post-training paradigm are poorly understood. We begin by exploring how input activations are affected by RLVR through the perspective of logit lens. Our systematic investigations across multiple post-trained LMMs suggest that RLVR shifts low-entropy activations unexpectedly, while high-entropy ones are less affected. We further demonstrate that such phenomena are associated with LMM reasoning by controlled experiments, suggesting a potentially beneficial role of modulating low-entropy activations. To this end, we propose Activation Replay, a novel simple yet effective training-free approach that boosts multimodal reasoning of post-trained LMMs without requiring expensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
