TL;DR
GHOST is a novel, training-free framework for efficient 3D reconstruction that intelligently manages memory by evicting redundant tokens based on 3D geometry, significantly reducing cache size and increasing inference speed.
Contribution
GHOST introduces a geometry-aware, hierarchical token eviction method that improves 3D reconstruction efficiency without retraining or quality loss.
Findings
Reduces KV cache size by nearly 50%.
Achieves 1.75x faster inference than existing methods.
Maintains high reconstruction quality across benchmarks.
Abstract
Streaming 3D reconstruction from long monocular video sequences requires maintaining a key-value (KV) cache that grows linearly with sequence length, creating a severe memory bottleneck. Existing approaches either truncate the cache to a fixed set of anchor frames, leading to reconstruction quality degradation, or rely on attention-score heuristics that are agnostic to 3D scene structure, failing to preserve geometrically valuable tokens. To address these problems, we present GHOST (Geometry-Hierarchical Online Streaming Token Eviction), a training-free KV cache management framework that exploits the model's own 3D geometry outputs to evict redundant tokens online. GHOST introduces three mutually reinforcing innovations: a hierarchical dual-level importance scoring scheme, a privilege mechanism that protects special tokens from eviction, and a cosine-similarity-guided layer-wise budget…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
