Local 3D Editing via 3D Distillation of CLIP Knowledge
Junha Hyung, Sungwon Hwang, Daejin Kim, Hyunji Lee, Jaegul Choo

TL;DR
This paper introduces LENeRF, a novel method for localized 3D editing using text inputs, leveraging CLIP knowledge distillation to improve control and visual quality in NeRF-based 3D content manipulation.
Contribution
The paper proposes LENeRF, a new framework with three modules that enable fine-grained, localized 3D editing guided solely by text, utilizing unsupervised 3D attention learning from CLIP.
Findings
LENeRF achieves high-quality localized 3D edits.
The method outperforms existing approaches in visual fidelity and control.
Experiments demonstrate effective multi-view consistency and semantic accuracy.
Abstract
3D content manipulation is an important computer vision task with many real-world applications (e.g., product design, cartoon generation, and 3D Avatar editing). Recently proposed 3D GANs can generate diverse photorealistic 3D-aware contents using Neural Radiance fields (NeRF). However, manipulation of NeRF still remains a challenging problem since the visual quality tends to degrade after manipulation and suboptimal control handles such as 2D semantic maps are used for manipulations. While text-guided manipulations have shown potential in 3D editing, such approaches often lack locality. To overcome these problems, we propose Local Editing NeRF (LENeRF), which only requires text inputs for fine-grained and localized manipulation. Specifically, we present three add-on modules of LENeRF, the Latent Residual Mapper, the Attention Field Network, and the Deformation Network, which are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · 3D Surveying and Cultural Heritage
MethodsContrastive Language-Image Pre-training
