GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
Peiye Zhuang, Songfang Han, Chaoyang Wang, Aliaksandr Siarohin, Jiaxu Zou, Michael Vasilkovsky, Vladislav Shakhrai, Sergey Korolev, Sergey Tulyakov, Hsin-Ying Lee

TL;DR
This paper introduces a series of modifications to large 3D reconstruction models that enhance geometry accuracy, computational efficiency, and texture fidelity, achieving state-of-the-art results and enabling rapid per-instance texture refinement.
Contribution
The paper presents novel architectural modifications, differentiable mesh extraction, and a fast texture refinement procedure to significantly improve 3D reconstruction quality and efficiency.
Findings
Achieved a PSNR of 28.67 on GSO dataset.
Enhanced texture reconstruction fidelity, including text and portraits.
Refinement improves PSNR to 29.79 in just 4 seconds.
Abstract
We propose a novel approach for 3D mesh reconstruction from multi-view images. Our method takes inspiration from large reconstruction models like LRM that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images. However, in our method, we introduce several important modifications that allow us to significantly enhance 3D reconstruction quality. First of all, we examine the original LRM architecture and find several shortcomings. Subsequently, we introduce respective modifications to the LRM architecture, which lead to improved multi-view image representation and more computationally efficient training. Second, in order to improve geometry reconstruction and enable supervision at full image resolution, we extract meshes from the NeRF field in a differentiable manner and fine-tune the NeRF model through mesh rendering. These…
Peer Reviews
Decision·ICLR 2025 Poster
1. Rapid Texture Refinement: The per-object texture refinement is lightweight and achieves faithful texture reconstruction requiring a mere 4 seconds on an A100 GPU. 2. Architecture Modifications: By replacing the DINO ViT transformer with a convolutional encoder and deconvolution layers with a linear layer followed by a pixelshuffle, GTR reduces artifacts of results and enhances high-frequency detail. These changes improve both the visual quality and efficiency of the model training. 3. GTR
1. Missing baseline: The paper does not compare with some stronger baselines, such as Mesh-LRM, which has released an online demo from its first author before the ICLR submission deadline. 2. The mesh quality is not satisfactory. In the abstract, the authors said this approach was for "3D mesh reconstruction". However, according to the videos in the supplementary, the video quality is not satisfactory. The surfaces have a lot of bumpy and grid-like artifacts. 3. The overall novelty is limited.
1. The paper proposes several designs that improve the quality of LRM. The paper performs ablation studies to validate the effectiveness of such designs.
1. The overall pipeline of the paper is similar to that of InstantMesh, which also applies LRM for mesh reconstruction with differentiable iso-surface methods. Both papers apply losses on depth and normal maps to improve the quality of the geometry. While the proposed designs are helpful, they do not provide very significant technical contributions. Recent works such as MeshLRM, GS-LRM, and LGM have adopted similar strategies to improve the network design and should be discussed here. 2.
1. The results are good in terms of both qualitative and quantative result. 2. The experiments are solid. Many baselines are compared. 3. The paper in written very well.
1. This work has a large improvement in terms of quantative resutls. However, this work differs from previous works only from some incremental improvements in architecture. So what bring this such a large improvement? An ablation study of the proposed tricks will be very appreciated. 2. How long does it take to generate a single shape in total? 3. Will this work be open-sourced? 4. What is the common failure case of this method? 5. In table 1, some baselines has better results than yours (CD, Io
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Medical Image Segmentation Techniques · Advanced Neural Network Applications
