LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models

Ziqi Lu; Heng Yang; Danfei Xu; Boyi Li; Boris Ivanovic; Marco Pavone,; Yue Wang

arXiv:2412.07746·cs.CV·December 11, 2024

LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation Models

Ziqi Lu, Heng Yang, Danfei Xu, Boyi Li, Boris Ivanovic, Marco Pavone,, Yue Wang

PDF

Open Access 3 Reviews

TL;DR

LoRA3D introduces an efficient self-calibration method for 3D geometric models that refines predictions and adapts models to new scenes using only multi-view images, achieving significant performance gains without external data.

Contribution

The paper presents LoRA3D, a novel self-calibration pipeline that specializes pre-trained 3D models to target scenes using multi-view predictions and low-rank adaptation, requiring minimal computation and no manual labels.

Findings

01

Achieves up to 88% performance improvement in 3D tasks.

02

Completes calibration on a single GPU in 5 minutes.

03

Requires only 18MB per low-rank adapter.

Abstract

Emerging 3D geometric foundation models, such as DUSt3R, offer a promising approach for in-the-wild 3D vision tasks. However, due to the high-dimensional nature of the problem space and scarcity of high-quality 3D data, these pre-trained models still struggle to generalize to many challenging circumstances, such as limited view overlap or low lighting. To address this, we propose LoRA3D, an efficient self-calibration pipeline to $specialize$ the pre-trained models to target scenes using their own multi-view predictions. Taking sparse RGB images as input, we leverage robust optimization techniques to refine multi-view predictions and align them into a global coordinate frame. In particular, we incorporate prediction confidence into the geometric optimization process, automatically re-weighting the confidence to better reflect point estimation accuracy. We use the calibrated…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 8Confidence 4

Strengths

The paper is very well written and easy to understand. I very much appreciate the core calibration framework text section and underlying method, in that is it technically principled, produces strong state-of-the-art improvements, and is presented in an intuitive manor. This is a great example of how ML/CV papers should be written. It is the first time I see LoRA used in the context of a 3D foundation model. I expect this to help greatly increase the potential impact of the paper. The overall

Weaknesses

I find little weakness with the paper. Mainly, I would have liked to see more experiments on MASt3R in the body of the main work, and perhaps an inclusion of the calibrated-confidence-based pseudo labelling in the MASt3R integration. This is perhaps the only point that stops me from recommending a strong accept.

Reviewer 02Rating 8Confidence 2

Strengths

This task is quite intriguing, presenting an efficient and useful technique for calibrating pre-trained 3D models for specific 3D tasks. The concept of calibration used to be employed in the field of uncertainty estimation to enhance output quality. Bringing calibration into a 3D foundation model presents a novel approach for "making foundation models" perform more effectively or align better with new or specialized tasks. The techniques sound and are easy to follow. Despite some poor observat

Weaknesses

The system is designed to adapt to a single scene, which appears to be a limitation of the DUSt3R approach. To evaluate performance, the methodology requires half of the test split data (1,000 images out of 2,000), which may limit practical applications. I am curious about the results if only 5%, 10%, or 30% of the data were available for calibration. Furthermore, it remains unclear what portion of the calibrated images comes from the TUM and Waymo datasets. Further details are needed regardin

Reviewer 03Rating 6Confidence 4

Strengths

The paper presents an efficient method to finetune Duster for individual scenes, showing considerable improvements in terms of reconstruction and camera estimation accuracy.

Weaknesses

***Missing baseline.*** A major weakness to me is missing a straightforward baseline, which is running a global optimisation that is similar to classical global bundle adjustment for all camera intrinsics, extrinsics, point maps, and scales, given initialisations from Dusters predictions. This baseline would also avoid finetuning Duster weights. I am curious about a fair comparison with this baseline, in terms of optimisation time, and per-scene optimisation accuracy. ***The comparison with COL

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical measurement and interference techniques · Advanced Measurement and Metrology Techniques · 3D Surveying and Cultural Heritage

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Convolution · 1x1 Convolution · ALIGN · Thinned U-shape Module · Adapter