ConTEXTure: Consistent Multiview Images to Texture
Jaehoon Ahn, Sumin Cho, Harim Jung, Kibeom Hong, Seonghoon Ban,, Moon-Ryul Jung

TL;DR
ConTEXTure is a novel generative network that creates consistent texture maps for 3D meshes from multiple viewpoints, leveraging view-consistent images generated conditioned on a text prompt and depth information.
Contribution
It introduces a method that generates view-consistent images for multiple viewpoints simultaneously, improving texture mapping accuracy over previous sequential approaches.
Findings
Produces viewpoint-accurate textures for 3D meshes
Ensures consistency across multiple viewpoints
Outperforms prior methods in texture quality
Abstract
We introduce ConTEXTure, a generative network designed to create a texture map/atlas for a given 3D mesh using images from multiple viewpoints. The process begins with generating a front-view image from a text prompt, such as 'Napoleon, front view', describing the 3D mesh. Additional images from different viewpoints are derived from this front-view image and camera poses relative to it. ConTEXTure builds upon the TEXTure network, which uses text prompts for six viewpoints (e.g., 'Napoleon, front view', 'Napoleon, left view', etc.). However, TEXTure often generates images for non-front viewpoints that do not accurately represent those viewpoints.To address this issue, we employ Zero123++, which generates multiple view-consistent images for the six specified viewpoints simultaneously, conditioned on the initial front-view image and the depth maps of the mesh for the six viewpoints. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques
