When ChatGPT for Computer Vision Will Come? From 2D to 3D
Chenghao Li, Chaoning Zhang

TL;DR
This paper reviews the progress of deep learning in text, image, and 3D vision, discusses the evolution of AIGC, and provides an outlook on developing a ChatGPT-like model for 3D computer vision.
Contribution
It offers a comprehensive overview of deep learning advancements across modalities and explores future directions for AIGC in 3D vision, highlighting the need for a unified model.
Findings
Deep learning has significantly advanced NLP, image, and 3D fields.
AIGC is evolving from data-centric perspectives.
Future development of 3D AIGC requires new model architectures.
Abstract
ChatGPT and its improved variant GPT4 have revolutionized the NLP field with a single model solving almost all text related tasks. However, such a model for computer vision does not exist, especially for 3D vision. This article first provides a brief view on the progress of deep learning in text, image and 3D fields from the model perspective. Moreover, this work further discusses how AIGC evolves from the data perspective. On top of that, this work presents an outlook on the development of AIGC in 3D from the data perspective.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
