Static and Animated 3D Scene Generation from Free-form Text Descriptions
Faria Huq, Nafees Ahmed, Anindya Iqbal

TL;DR
This paper presents a novel two-stage neural pipeline that generates static and animated 3D scenes from free-form text descriptions, focusing on simple geometric shapes and achieving high accuracy in feature detection.
Contribution
The work introduces a flexible scene generation pipeline from free-form text using advanced language models and multi-head decoders, expanding beyond rigid sentence structures.
Findings
Achieved 98.427% accuracy in object feature detection
Generated a large synthetic dataset with over 1.3 million static and 1.4 million animated samples
Demonstrated proof of concept for broader 3D scene generation from natural language
Abstract
Generating coherent and useful image/video scenes from a free-form textual description is technically a very difficult problem to handle. Textual description of the same scene can vary greatly from person to person, or sometimes even for the same person from time to time. As the choice of words and syntax vary while preparing a textual description, it is challenging for the system to reliably produce a consistently desirable output from different forms of language input. The prior works of scene generation have been mostly confined to rigorous sentence structures of text input which restrict the freedom of users to write description. In our work, we study a new pipeline that aims to generate static as well as animated 3D scenes from different types of free-form textual scene description without any major restriction. In particular, to keep our study practical and tractable, we focus on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Human Motion and Animation
