Static and Animated 3D Scene Generation from Free-form Text Descriptions

Faria Huq; Nafees Ahmed; Anindya Iqbal

arXiv:2010.01549·cs.CV·December 1, 2020

Static and Animated 3D Scene Generation from Free-form Text Descriptions

Faria Huq, Nafees Ahmed, Anindya Iqbal

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel two-stage neural pipeline that generates static and animated 3D scenes from free-form text descriptions, focusing on simple geometric shapes and achieving high accuracy in feature detection.

Contribution

The work introduces a flexible scene generation pipeline from free-form text using advanced language models and multi-head decoders, expanding beyond rigid sentence structures.

Findings

01

Achieved 98.427% accuracy in object feature detection

02

Generated a large synthetic dataset with over 1.3 million static and 1.4 million animated samples

03

Demonstrated proof of concept for broader 3D scene generation from natural language

Abstract

Generating coherent and useful image/video scenes from a free-form textual description is technically a very difficult problem to handle. Textual description of the same scene can vary greatly from person to person, or sometimes even for the same person from time to time. As the choice of words and syntax vary while preparing a textual description, it is challenging for the system to reliably produce a consistently desirable output from different forms of language input. The prior works of scene generation have been mostly confined to rigorous sentence structures of text input which restrict the freedom of users to write description. In our work, we study a new pipeline that aims to generate static as well as animated 3D scenes from different types of free-form textual scene description without any major restriction. In particular, to keep our study practical and tractable, we focus on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oaishi/3DScene_from_text
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Human Motion and Animation