CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation

Rajeev Goel; Jason Ding; Phani Harish Wajjala; Pavan Turaga; Tejaswi Gowda; Krishna C. Garikipati

arXiv:2605.18680·cs.CV·May 19, 2026

CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation

Rajeev Goel, Jason Ding, Phani Harish Wajjala, Pavan Turaga, Tejaswi Gowda, Krishna C. Garikipati

PDF

TL;DR

CMAG introduces a novel framework for marketplace avatar generation that uses concept scaffolding, visual evidence, and iterative verification to improve retrieval robustness and compositional accuracy from 3D assets.

Contribution

It proposes a new concept-scaffolded retrieval and verification framework that enhances avatar assembly from catalog assets under ambiguous prompts.

Findings

01

CMAG outperforms strong baselines in retrieval robustness.

02

It achieves higher compositional correctness in avatar generation.

03

The framework effectively handles prompt ambiguity and topology constraints.

Abstract

Metaverse platforms rely on creator-driven marketplaces where avatars are assembled from discrete, taxonomy-labeled 3D assets (e.g., tops, bottoms, shoes, accessories) under strict category and topology constraints. While users increasingly expect free-form text control, text-only retrieval is brittle: natural language is ambiguous with respect to platform taxonomies, metadata is often noisy or informal, and independently retrieved components can be stylistically inconsistent or geometrically incompatible. We propose \textbf{CMAG}, a concept-scaffolded retrieval and verified composition framework for marketplace avatar generation. Given a prompt, CMAG first synthesizes an intermediate 3D concept scaffold that disambiguates intent beyond text by providing global spatial and stylistic context. In parallel, a view-aware part discovery module extracts localized visual evidence via prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.