Compositional Scene Representation Learning via Reconstruction: A Survey
Jinyang Yuan, Tonglin Chen, Bin Li, Xiangyang Xue

TL;DR
This survey reviews recent advances in learning compositional scene representations through reconstruction using deep neural networks, highlighting progress, benchmarks, limitations, and future directions in the field.
Contribution
It provides a comprehensive overview of reconstruction-based compositional scene representation learning methods, including development history, categorizations, benchmarks, and open source tools.
Findings
Progress in deep learning methods for scene representation
Benchmark datasets and open source toolbox provided
Discussion on limitations and future research directions
Abstract
Visual scenes are composed of visual concepts and have the property of combinatorial explosion. An important reason for humans to efficiently learn from diverse visual scenes is the ability of compositional perception, and it is desirable for artificial intelligence to have similar abilities. Compositional scene representation learning is a task that enables such abilities. In recent years, various methods have been proposed to apply deep neural networks, which have been proven to be advantageous in representation learning, to learn compositional scene representations via reconstruction, advancing this research direction into the deep learning era. Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation. In this survey, we first outline the current progress on reconstruction-based compositional scene…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
