TexTailor: Customized Text-aligned Texturing via Effective Resampling
Suin Lee, Dae-Shik Kim

TL;DR
TexTailor is a new method that improves the consistency of object textures generated from text descriptions by integrating previous textures, fine-tuning a depth-aware diffusion model, and adaptively selecting camera viewpoints based on object geometry.
Contribution
It introduces a resampling scheme for texture integration, a performance preservation loss for limited training data, and adaptive camera positioning to enhance view-consistent text-to-texture synthesis.
Findings
Outperforms state-of-the-art in view-consistent texture synthesis
Effective in generating high-fidelity, texture-aligned images
Demonstrates robustness on Objaverse and ShapeNet datasets
Abstract
We present TexTailor, a novel method for generating consistent object textures from textual descriptions. Existing text-to-texture synthesis approaches utilize depth-aware diffusion models to progressively generate images and synthesize textures across predefined multiple viewpoints. However, these approaches lead to a gradual shift in texture properties across viewpoints due to (1) insufficient integration of previously synthesized textures at each viewpoint during the diffusion process and (2) the autoregressive nature of the texture synthesis process. Moreover, the predefined selection of camera positions, which does not account for the object's geometry, limits the effective use of texture information synthesized from different viewpoints, ultimately degrading overall texture consistency. In TexTailor, we address these issues by (1) applying a resampling scheme that repeatedly…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper proposes TexTailor to address view-consistent texture synthesis by combining inpainting with resampling and fine-tuning. 2. Method and results are presented clearly and logically, making the paper easy to follow.
1. While effective, the approach primarily combines existing techniques, with limited emphasis on novel contributions. The paper could be strengthened by enhancing the resampling scheme or accelerating the fine-tuning phase.
- This paper adequately identifies problems in previous methods for text-driven object texturing, including lack of texture consistency and graduality in texture changes. The origin of this problems are identified as being caused by insufficient integration, predefinition of camera positions, and autorregresion. The paper introduces changes to these methods, to enhance their quality and consistency. This is an important line of research, as these works are becoming more prevalent in the literatu
- While sound, the ideas introduced in this work are somewhat limited in scope and the paper fails to be compelling that they are particularly effective. In this sense, I am not convinced about the extent upon which these contributions will be impactful in the literature. Furthermore, the resampling scheme introduced in this paper is not new, as it is borrowed from previous work. Therefore, the ideas introduced here are not particularly novel nor signficant. - Insufficient results are shown on t
## Motivation The paper starts with an analysis of the limitations of previous methods. It hypothesizes those inconsistent results from previous methods are mainly coming from an inappropriate way of integrating information from previously synthesized textures. Given this agile insight, it tries to addresses the inconsistency issue by proposing a new approach to better use information across different viewpoints and previously synthesized textures. The motivation of the paper is more about a t
What concerns me the most in this paper is the motivation behind some technical parts and its unclear writing. ## Motivation - In Line 93, it is not clear to me why finetuning a depth-aware T2I model matters. Maybe including a brief explanation could be helpful. ## Method - In Section 3.1, the authors propose a non-Markov process to reduce the sampling steps. However, the benefits of it is confusing to me. Would it involve a faster sampling speed? If it would, there is not result to support
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques
MethodsDiffusion
