SAM2RL: Towards Reinforcement Learning Memory Control in Segment Anything Model 2

Alen Adamyan; Tom\'a\v{s} \v{C}\'i\v{z}ek; Matej Straka; Klara Janouskova; Martin Schmid

arXiv:2507.08548·cs.CV·July 14, 2025

SAM2RL: Towards Reinforcement Learning Memory Control in Segment Anything Model 2

Alen Adamyan, Tom\'a\v{s} \v{C}\'i\v{z}ek, Matej Straka, Klara Janouskova, Martin Schmid

PDF

TL;DR

This paper introduces a reinforcement learning approach to optimize memory updates in the Segment Anything Model 2, significantly improving its object tracking performance by treating memory control as a sequential decision-making problem.

Contribution

It presents a novel reinforcement learning method for memory management in SAM 2, outperforming existing heuristic-based update rules in object tracking tasks.

Findings

01

Reinforcement learning-based memory control yields over three times the improvement of heuristics.

02

The approach enhances temporal consistency in video object tracking.

03

Results demonstrate the potential of RL to unlock the memory bank capabilities.

Abstract

Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks and has become the state-of-the-art for visual object tracking. The model stores information from previous frames in a memory bank, enabling temporal consistency across video sequences. Recent methods augment SAM 2 with hand-crafted update rules to better handle distractors, occlusions, and object motion. We propose a fundamentally different approach using reinforcement learning for optimizing memory updates in SAM 2 by framing memory control as a sequential decision-making problem. In an overfitting setup with a separate agent per video, our method achieves a relative improvement over SAM 2 that exceeds by more than three times the gains of existing heuristics. These results reveal the untapped potential of the memory bank and highlight reinforcement learning as a powerful alternative to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.