3AM: 3egment Anything with Geometric Consistency in Videos

Yang-Che Sun; Cheng Sun; Chin-Yang Lin; Fu-En Yang; Min-Hung Chen; Yen-Yu Lin; Yu-Lun Liu

arXiv:2601.08831·cs.CV·April 17, 2026

3AM: 3egment Anything with Geometric Consistency in Videos

Yang-Che Sun, Cheng Sun, Chin-Yang Lin, Fu-En Yang, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu

PDF

1 Repo

TL;DR

3AM enhances video object segmentation by integrating 3D-aware features into SAM2, achieving geometry-consistent recognition without requiring camera poses or preprocessing, and significantly outperforming existing methods.

Contribution

The paper introduces 3AM, a novel training-time enhancement that fuses 3D-aware features with appearance features for improved geometry consistency in video segmentation.

Findings

01

Achieves 90.6% IoU on ScanNet++ dataset.

02

Outperforms state-of-the-art VOS methods by +15.9 IoU points.

03

Requires only RGB input at inference, no camera poses or preprocessing.

Abstract

Video object segmentation methods like SAM2 achieve strong performance through memory-based architectures but struggle under large viewpoint changes due to reliance on appearance features. Traditional 3D instance segmentation methods address viewpoint consistency but require camera poses, depth maps, and expensive preprocessing. We introduce 3AM, a training-time enhancement that integrates 3D-aware features from MUSt3R into SAM2. Our lightweight Feature Merger fuses multi-level MUSt3R features that encode implicit geometric correspondence. Combined with SAM2's appearance features, the model achieves geometry-consistent recognition grounded in both spatial position and visual similarity. We propose a field-of-view aware sampling strategy ensuring frames observe spatially consistent object regions for reliable 3D correspondence learning. Critically, our method requires only RGB input at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jayisaking/3AM-Page
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.