Meshtron: High-Fidelity, Artist-Like 3D Mesh Generation at Scale
Zekun Hao, David W. Romero, Tsung-Yi Lin, Ming-Yu Liu

TL;DR
Meshtron is a scalable autoregressive model that generates high-fidelity 3D meshes with up to 64K faces and 1024-level coordinate resolution, significantly surpassing previous methods in detail and realism.
Contribution
The paper introduces Meshtron, a novel model that enables high-resolution, artist-like 3D mesh generation at scale, overcoming limitations of face count and coordinate resolution in prior work.
Findings
Generates meshes with up to 64K faces and 1024-level coordinate resolution.
Achieves over 50% less training memory and 2.5x faster throughput.
Produces detailed, realistic 3D meshes comparable to professional artists.
Abstract
Meshes are fundamental representations of 3D surfaces. However, creating high-quality meshes is a labor-intensive task that requires significant time and expertise in 3D modeling. While a delicate object often requires over faces to be accurately modeled, recent attempts at generating artist-like meshes are limited to K faces and heavy discretization of vertex coordinates. Hence, scaling both the maximum face count and vertex coordinate resolution is crucial to producing high-quality meshes of realistic, complex 3D objects. We present Meshtron, a novel autoregressive mesh generation model able to generate meshes with up to 64K faces at 1024-level coordinate resolution --over an order of magnitude higher face count and higher coordinate resolution than current state-of-the-art methods. Meshtron's scalability is driven by four key components: (1) an hourglass…
Peer Reviews
Decision·Submitted to ICLR 2025
The paper addresses the challenge of 3D mesh generation by achieving high-quality outputs with up to 64K faces at 1024-level coordinate resolution, representing a significant advancement over existing methods. The intuition and motivation behind the approach are clearly articulated, supported by a thoughtful discussion of real-world data. The authors further validate their method by conditioning on a variety of mesh types, including artist-created meshes and text-to-3D generated meshes, demonstr
First, I find it unclear why the Hourglass Transformer is considered the appropriate solution to address the challenge of generating the latter tokens of a triangle. While the authors explain both the difficulty and the Hourglass Transformer reasonably well, the connection between the two could be more explicitly justified. Second, the model’s reliance on curated, high-quality datasets raises concerns about scalability and applicability to domains with limited or noisy data. In real-world scenar
The work is well ablated, and the choice in architectural design taken by the authors is reasonable and easy to understand. Overall, the work is well written and well structured, making it easy to understand why and what was chosen and how it was implemented. The appendix is additionally very useful to reimplement this work. Overall, the work seems to have well-reasoned analysis for training mesh generations from autoregressive models and additionally adds limitations to the mesh generation proc
I am not an expert in this research area, but for me, the main weakness is the claim that previous work "recent attempts at generating artist-like meshes are limited to 1.6K faces and heavy discretization of vertex coordinates". I want to refer the authors to works such as [1, 2] that show mesh reconstruction with more that 1.6K faces. Both of the mentioned works [1, 2] show high-quality reconstructions and are not compared by the authors. [1] Shen, Tianchang, et al. "Flexible Isosurface Extrac
The proposed method addresses an important challenge in 3D mesh generation: the number of faces significantly affects the quality and fidelity of the results, making a higher face count crucial. Experimental results indicate that this method can produce highly detailed meshes. Additionally, the authors present the motivation for their algorithm design, based on observed issues in real-world data, offering valuable insights for the academic community.
The author mentions that the tokens of the last vertex have high perplexity, which led to the design of the Hourglass Transformer to address this issue. However, the causal relationship here does not seem entirely clear. Within the architecture of the Hourglass Transformer, there is no obvious special treatment for the last vertex tokens. While the Hourglass Transformer is an effective structure, linking it directly to the perplexity of the last vertex tokens feels somewhat tenuous. In the exp
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Interactive and Immersive Displays · Architecture and Computational Design
