MADrive: Memory-Augmented Driving Scene Modeling
Polina Karpikova, Daniil Selikhanovych, Kirill Struminsky, Ruslan Musaev, Maria Golitsyna, Dmitry Baranchuk

TL;DR
MADrive introduces a memory-augmented scene reconstruction framework for autonomous driving that replaces observed vehicles with similar 3D assets, enabling photorealistic scene alterations and improved scene modeling.
Contribution
It presents MADrive, a novel framework that incorporates external memory for vehicle replacement in scene reconstruction, enhancing realism and flexibility in autonomous driving scene modeling.
Findings
Enables photorealistic synthesis of altered driving scenes.
Replaces observed vehicles with similar 3D assets from a large-scale memory bank.
Demonstrates improved scene realism and flexibility in experiments.
Abstract
Recent advances in scene reconstruction have pushed toward highly realistic modeling of autonomous driving (AD) environments using 3D Gaussian splatting. However, the resulting reconstructions remain closely tied to the original observations and struggle to support photorealistic synthesis of significantly altered or novel driving scenarios. This work introduces MADrive, a memory-augmented reconstruction framework designed to extend the capabilities of existing scene reconstruction methods by replacing observed vehicles with visually similar 3D assets retrieved from a large-scale external memory bank. Specifically, we release MAD-Cars, a curated dataset of K 360{\deg} car videos captured in the wild and present a retrieval module that finds the most similar car instances in the memory bank, reconstructs the corresponding 3D assets from video, and integrates them into the…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. Retrieval + relightable asset insertion is a pragmatic route to full 360° vehicle coverage from sparse in-scene views. 2. Physically based relighting integrated with 2D Gaussian splats, enabling plausible appearance under new illumination without multi-illumination capture. 3. Consistent gains in MOTA/IDF1 and segmentation IoU on synthesized future frames, with informative qualitative results. 4. MAD-CARS (∼70k car videos) enables better retrieval and more realistic assets than smaller real-c
1. Test-time insertion uses ground-truth 3D boxes; robustness to noisy boxes or tracker outputs is not reported. Provide stress tests that perturb box positions/orientations and quantify impacts on all metrics. 2. While color filtering helps, mis-retrievals (wrong trim level, body kit, or subtle geometry) could degrade realism. Add user studies or automatic metrics for fine-grained make/model/color matching on held-out scenes; report failure cases and a fallback strategy (e.g., top-k retrieval
1. **Major Dataset Contribution:** The MAD-CARS dataset is a substantial contribution to the community. With ~70k instances, it dramatically expands the scale and diversity of publicly available multi-view car data. The detailed curation process described in the appendix further enhances its value and potential for future research in 3D reconstruction, novel view synthesis, and generative modeling. 2. **Strong and Comprehensive Technical Execution:** The proposed MADRIVE framework is technically
1. **Loss of Object Identity:** The most fundamental limitation is the trade-off between photorealism and identity preservation. The framework replaces an object with a *similar* one, not the *exact* one. This means any unique characteristics of the original vehicle (e.g., license plates, dents, scratches, bumper stickers, specific dirt patterns) are lost. For applications like "replaying" a safety-critical event for debugging, this loss of identity could be a critical flaw. The paper acknowledg
1. The paper is well written, with clear structure and professional presentation. 2. Figures and tables effectively communicate the method and results; visualizations are clear and informative. 3. The release of MAD-CARS is a meaningful community contribution that should facilitate reproducible research, stronger baselines, and broader progress in scene editing and AD simulation.
1. Comparative fairness. Since the method replaces in-scene vehicles with assets from a memory bank, direct comparisons against pure 3DGS baselines are not fully comparable. Please add evaluations against object-replacement or editing baselines (e.g., Chatsim [1]) under matched settings to establish fair performance gaps. 2. Quality and evaluation breadth. The presented results appear modest in fidelity (notably low resolution), lack vehicle rotation cases, and omit downstream AD task evaluatio
Clear motives and interesting research questions Clear method design and comprehensive technical details Detailed and rigorous experimental design
1. The image quality in the paper is subpar, making them unsuitable for practical scenario applications. Clear, high-resolution images are essential for effectively illustrating key findings and ensuring the reproducibility of results, which the current figures fail to achieve. 2. The reconstruction results presented in the demo appear to be of low quality. Specific issues (e.g., blurred details, inaccurate structural restoration, or inconsistent texture mapping) are not adequately addressed,
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Traffic Prediction and Management Techniques · Computer Graphics and Visualization Techniques
