Explicit Memory through Online 3D Gaussian Splatting Improves Class-Agnostic Video Segmentation
Anthony Opipari, Aravindhan K Krishnan, Shreekant Gayaka, Min Sun, Cheng-Hao Kuo, Arnie Sen, Odest Chadwicke Jenkins

TL;DR
This paper introduces an online 3D Gaussian Splatting memory to enhance class-agnostic video segmentation, leading to more accurate and consistent predictions by augmenting existing models with explicit object-level memory.
Contribution
The authors develop an online 3D Gaussian Splatting technique and fusion methods to incorporate explicit memory into segmentation models, improving their performance.
Findings
Models with explicit 3D memory outperform those without memory.
The proposed methods improve accuracy and consistency in real-world and simulated benchmarks.
Ablation studies validate the effectiveness of the memory and fusion techniques.
Abstract
Remembering where object segments were predicted in the past is useful for improving the accuracy and consistency of class-agnostic video segmentation algorithms. Existing video segmentation algorithms typically use either no object-level memory (e.g. FastSAM) or they use implicit memories in the form of recurrent neural network features (e.g. SAM2). In this paper, we augment both types of segmentation models using an explicit 3D memory and show that the resulting models have more accurate and consistent predictions. For this, we develop an online 3D Gaussian Splatting (3DGS) technique to store predicted object-level segments generated throughout the duration of a video. Based on this 3DGS representation, a set of fusion techniques are developed, named FastSAM-Splat and SAM2-Splat, that use the explicit 3DGS memory to improve their respective foundation models' predictions. Ablation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
