MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3D Reconstruction in Complex Scenes

Luoxi Zhang; Chun Xie; Itaru Kitahara

arXiv:2602.06158·cs.CV·February 9, 2026

MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3D Reconstruction in Complex Scenes

Luoxi Zhang, Chun Xie, Itaru Kitahara

PDF

Open Access

TL;DR

This paper introduces MGP-KAD, a framework combining multimodal features and a Kolmogorov-Arnold decoder to improve single-view 3D reconstruction accuracy in complex scenes, addressing noise and object diversity challenges.

Contribution

It presents a novel multimodal fusion approach with geometric priors and a hybrid Kolmogorov-Arnold decoder, advancing the state-of-the-art in 3D reconstruction from single images.

Findings

01

Achieves SOTA performance on Pix3D dataset.

02

Enhances geometric detail and smoothness in reconstructions.

03

Effectively handles complex real-world scenes.

Abstract

Single-view 3D reconstruction in complex real-world scenes is challenging due to noise, object diversity, and limited dataset availability. To address these challenges, we propose MGP-KAD, a novel multimodal feature fusion framework that integrates RGB and geometric prior to enhance reconstruction accuracy. The geometric prior is generated by sampling and clustering ground-truth object data, producing class-level features that dynamically adjust during training to improve geometric understanding. Additionally, we introduce a hybrid decoder based on Kolmogorov-Arnold Networks (KAN) to overcome the limitations of traditional linear decoders in processing complex multimodal inputs. Extensive experiments on the Pix3D dataset demonstrate that MGP-KAD achieves state-of-the-art (SOTA) performance, significantly improving geometric integrity, smoothness, and detail preservation. Our work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization