MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3D Reconstruction in Complex Scenes
Luoxi Zhang, Chun Xie, Itaru Kitahara

TL;DR
This paper introduces MGP-KAD, a framework combining multimodal features and a Kolmogorov-Arnold decoder to improve single-view 3D reconstruction accuracy in complex scenes, addressing noise and object diversity challenges.
Contribution
It presents a novel multimodal fusion approach with geometric priors and a hybrid Kolmogorov-Arnold decoder, advancing the state-of-the-art in 3D reconstruction from single images.
Findings
Achieves SOTA performance on Pix3D dataset.
Enhances geometric detail and smoothness in reconstructions.
Effectively handles complex real-world scenes.
Abstract
Single-view 3D reconstruction in complex real-world scenes is challenging due to noise, object diversity, and limited dataset availability. To address these challenges, we propose MGP-KAD, a novel multimodal feature fusion framework that integrates RGB and geometric prior to enhance reconstruction accuracy. The geometric prior is generated by sampling and clustering ground-truth object data, producing class-level features that dynamically adjust during training to improve geometric understanding. Additionally, we introduce a hybrid decoder based on Kolmogorov-Arnold Networks (KAN) to overcome the limitations of traditional linear decoders in processing complex multimodal inputs. Extensive experiments on the Pix3D dataset demonstrate that MGP-KAD achieves state-of-the-art (SOTA) performance, significantly improving geometric integrity, smoothness, and detail preservation. Our work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization
