MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D   Content Creation

Sankalp Sinha; Mohammad Sadil Khan; Muhammad Usama; Shino Sam; Didier; Stricker; Sk Aziz Ali; Muhammad Zeshan Afzal

arXiv:2411.17945·cs.CV·March 27, 2025

MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation

Sankalp Sinha, Mohammad Sadil Khan, Muhammad Usama, Shino Sam, Didier, Stricker, Sk Aziz Ali, Muhammad Zeshan Afzal

PDF

Open Access 2 Repos 1 Datasets

TL;DR

This paper introduces MARVEL-40M+, a large-scale dataset with multi-level annotations for 3D assets, and a two-stage text-to-3D pipeline that enhances high-fidelity content creation from text prompts.

Contribution

It presents a novel multi-stage annotation pipeline combining VLMs and LLMs, and develops MARVEL-FX3D, a fast text-to-3D generation method, advancing dataset quality and generation speed.

Findings

01

MARVEL-40M+ outperforms existing datasets in annotation quality and diversity.

02

The pipeline achieves 72.41% win rate by GPT-4 and 73.40% by humans.

03

3D textured meshes generated within 15 seconds.

Abstract

Generating high-fidelity 3D content from text prompts remains a significant challenge in computer vision due to the limited size, diversity, and annotation depth of the existing datasets. To address this, we introduce MARVEL-40M+, an extensive dataset with 40 million text annotations for over 8.9 million 3D assets aggregated from seven major 3D datasets. Our contribution is a novel multi-stage annotation pipeline that integrates open-source pretrained multi-view VLMs and LLMs to automatically produce multi-level descriptions, ranging from detailed (150-200 words) to concise semantic tags (10-20 words). This structure supports both fine-grained 3D reconstruction and rapid prototyping. Furthermore, we incorporate human metadata from source datasets into our annotation pipeline to add domain-specific information in our annotation and reduce VLM hallucinations. Additionally, we develop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

sankalpsinha77/MARVEL-40M
dataset· 37 dl
37 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Computer Graphics and Visualization Techniques

MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Adam · Residual Connection · Softmax