Compressed Map Priors for 3D Perception

Brady Zhou; Philipp Kr\"ahenb\"uhl

arXiv:2601.00139·cs.CV·January 5, 2026

Compressed Map Priors for 3D Perception

Brady Zhou, Philipp Kr\"ahenb\"uhl

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Compressed Map Priors, a memory-efficient framework that leverages historic traversals to improve 3D perception in autonomous systems with minimal computational overhead.

Contribution

It presents a novel, highly compact map prior method that significantly enhances 3D object detection in autonomous perception systems.

Findings

01

20x reduction in map storage size

02

Significant improvement in 3D object detection accuracy

03

Easy integration with existing perception architectures

Abstract

Human drivers rarely travel where no person has gone before. After all, thousands of drivers use busy city roads every day, and only one can claim to be the first. The same holds for autonomous computer vision systems. The vast majority of the deployment area of an autonomous vision system will have been visited before. Yet, most autonomous vehicle vision systems act as if they are encountering each location for the first time. In this work, we present Compressed Map Priors (CMP), a simple but effective framework to learn spatial priors from historic traversals. The map priors use a binarized hashmap that requires only $32 KB / km^{2}$ , a $20 \times$ reduction compared to the dense storage. Compressed Map Priors easily integrate into leading 3D perception systems at little to no extra computational costs, and lead to a significant and consistent improvement in 3D object…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

1. The map prior encoding method proposed in this paper offers storage advantages compared to previous approaches. 2. The proposed method enables end-to-end optimization of map priors and perception tasks.

Weaknesses

1. The proposed method was only tested on a single dataset and lacks validation on other mainstream datasets, such as KITTI, Waymo, Argoverse2, etc. 2. The experimental section compares against outdated methods and lacks comparisons with the latest state-of-the-art approaches. 3. The proposed method is limited to datasets where the training and testing data have overlapping areas on the map, making it difficult to apply in real-world open scenarios. 4. The proposed model contains numerous hyperp

Reviewer 02Rating 4Confidence 3

Strengths

1. The idea of efficiently leveraging historical traversals to inform 3D perception systems addresses a fundamental inefficiency in current approaches, which often treat every scene as novel despite repeated exposure. 2. The architecture is modular and demonstrated to work across several leading baselines (BEVDet, BEVFormer, PETR), with minimal intrusion. Thorough experiments: CMP is compared quantitatively against strong baselines, including modern learned and traditional map priors, across mul

Weaknesses

1. The method is explicitly described as being beneficial in well-traversed environments (Section 6), but its limitations in places with limited or no prior coverage are only superficially addressed via random patch masking. No rigorous experiments or quantitative breakdowns for novel/unseen areas are provided, raising concerns for real-world deployment. 2. Though BEVFormer, PETR, and BEVDet are credible representatives, modern BEV occupancy grid predictors (such as OccFeat or PointBeV, referen

Reviewer 03Rating 4Confidence 5

Strengths

Pros: 1. Consistent accuracy gains on nuScenes across three diverse baselines; largest relative lift on BEV-style models. 2. Simple, detector-agnostic add-on: clean fusion blocks for BEV and transformer stacks (concat+Conv vs. cross-attention). 3. Thoughtful ablations: traversal count sensitivity and distance-band analysis support the “priors help when signal is weak/far”.

Weaknesses

Cons: 1. While the method is shown across multiple camera-only 3D detectors, several stronger, recent baselines (e.g., StreamPETR, BEVNext) are missing. Without results on higher baselines, it’s hard to judge headroom and true practical impact. 2. The approach assumes AVs mostly drive in previously seen areas where priors exist; the conclusion also notes retraining/retuning is needed for new environments, which reduces universality. 3. No stress tests for prior dropout or corruption. In deployme

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications