MutualNeRF: Improve the Performance of NeRF under Limited Samples with Mutual Information Theory

Zifan Wang; Jingwei Li; Yitang Li; Yunze Liu

arXiv:2505.11386·cs.CV·June 10, 2025

MutualNeRF: Improve the Performance of NeRF under Limited Samples with Mutual Information Theory

Zifan Wang, Jingwei Li, Yitang Li, Yunze Liu

PDF

Open Access 3 Reviews

TL;DR

MutualNeRF leverages Mutual Information Theory to enhance Neural Radiance Field performance with limited samples by strategically selecting viewpoints and maximizing information transfer, leading to improved 3D scene synthesis.

Contribution

This work introduces a theoretically grounded framework using Mutual Information to improve NeRF with limited data, including a greedy algorithm for viewpoint selection and regularization for few-shot synthesis.

Findings

01

Consistent improvement over state-of-the-art methods in limited sample scenarios

02

Effective viewpoint selection reduces mutual information overlap

03

Enhanced 3D scene synthesis quality with fewer samples

Abstract

This paper introduces MutualNeRF, a framework enhancing Neural Radiance Field (NeRF) performance under limited samples using Mutual Information Theory. While NeRF excels in 3D scene synthesis, challenges arise with limited data and existing methods that aim to introduce prior knowledge lack theoretical support in a unified framework. We introduce a simple but theoretically robust concept, Mutual Information, as a metric to uniformly measure the correlation between images, considering both macro (semantic) and micro (pixel) levels. For sparse view sampling, we strategically select additional viewpoints containing more non-overlapping scene information by minimizing mutual information without knowing ground truth images beforehand. Our framework employs a greedy algorithm, offering a near-optimal solution. For few-shot view synthesis, we maximize the mutual information between inferred…

Peer Reviews

Decision·UAI 2025 Poster

Reviewer 01Rating 3Confidence 4

Strengths

- This paper is well written and easy to follow. - The framework’s design is comprehensive, considering both macro (semantic) and micro (pixel) perspective in the task of sparse view sampling and few-shot NVS.

Weaknesses

- **Complex methodology with marginal gains:** This methodology introduces significant complexity, especially in sparse view sampling, involving greedy algorithms and complex mutual information metrics. However, the observed improvements over simpler baselines are relatively minor, which may not justify the added complexity. - **Lack of novelty.** The attempt to address the task of few-shot novel view synthesis through minimization of mutual information between viewpoints have already been explo

Reviewer 02Rating 5Confidence 5

Strengths

The authors demonstrate the effectiveness of both sparse view sampling and few-shot view synthesis. Using mutual information to select views for NeRF is reasonable.

Weaknesses

1. The motivation for pixel space distance is not clarified. According to Definition 3, the pixel space distance is the expectation of distance between any two points of rays. The authors should clarify the motivation behind Definition 3 since it is important for the following parts. The semantic space distance is fairly reasonable. 2. If my understanding is correct, for the few-shot view synthesis, the authors just added two regularization terms to the NeRF training. However, the relationship

Reviewer 03Rating 5Confidence 4

Strengths

Utilizes mutual information theory to improve NeRF under limited data. Proposes a greedy algorithm for strategic viewpoint selection in sparse view sampling. Introduces efficient regularization terms to enhance few-shot view synthesis.

Weaknesses

1. How about using mutual information for 3DGS? 2. What is the rendering speed of the proposed method, especially when compared with 3DGS? 3. I suggest that the authors compare or discuss with more methods, like PixelNeRF[A], CR-NeRF[B], and MVSGaussian[C], to fully verify the effectiveness of the proposed mutual information. 4. Why choose to pick images instead of training as a whole? Are there images that do not belong to the target scene? If we use a fast reconstruction method like 3DGS, trai

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging