SeqTex: Generate Mesh Textures in Video Sequence

Ze Yuan (1); Xin Yu (1); Yangtian Sun (1); Yuan-Chen Guo (2); Yan-Pei Cao (2); Ding Liang (2); Xiaojuan Qi (1) ((1) HKU; (2) VAST)

arXiv:2507.04285·cs.CV·July 8, 2025

SeqTex: Generate Mesh Textures in Video Sequence

Ze Yuan (1), Xin Yu (1), Yangtian Sun (1), Yuan-Chen Guo (2), Yan-Pei Cao (2), Ding Liang (2), Xiaojuan Qi (1) ((1) HKU, (2) VAST)

PDF

TL;DR

SeqTex introduces an end-to-end method leveraging pretrained video models to directly generate high-quality, consistent UV texture maps for 3D models, overcoming limitations of existing multi-stage approaches.

Contribution

It reformulates 3D texture generation as a sequence modeling task, enabling direct UV map synthesis using video foundation models and novel architectural innovations.

Findings

01

Achieves state-of-the-art results in 3D texture generation

02

Demonstrates superior 3D consistency and texture-geometry alignment

03

Generalizes well to real-world scenarios

Abstract

Training native 3D texture generative models remains a fundamental yet challenging problem, largely due to the limited availability of large-scale, high-quality 3D texture datasets. This scarcity hinders generalization to real-world scenarios. To address this, most existing methods finetune foundation image generative models to exploit their learned visual priors. However, these approaches typically generate only multi-view images and rely on post-processing to produce UV texture maps -- an essential representation in modern graphics pipelines. Such two-stage pipelines often suffer from error accumulation and spatial inconsistencies across the 3D surface. In this paper, we introduce SeqTex, a novel end-to-end framework that leverages the visual knowledge encoded in pretrained video foundation models to directly generate complete UV texture maps. Unlike previous methods that model the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.