Semantic RGB-D Image Synthesis
Shijie Li, Rong Li, Juergen Gall

TL;DR
This paper introduces a multi-modal semantic RGB-D image synthesis method that generates realistic RGB-D images from semantic labels, improving data diversity and segmentation accuracy in privacy-sensitive applications.
Contribution
It proposes a novel generator and discriminator architecture for multi-modal data that enhances realism and semantic consistency in synthesized images.
Findings
Outperforms previous uni-modal methods significantly
Mixing real and generated images improves segmentation accuracy
Generates realistic RGB-D images from semantic label maps
Abstract
Collecting diverse sets of training images for RGB-D semantic image segmentation is not always possible. In particular, when robots need to operate in privacy-sensitive areas like homes, the collection is often limited to a small set of locations. As a consequence, the annotated images lack diversity in appearance and approaches for RGB-D semantic image segmentation tend to overfit the training data. In this paper, we thus introduce semantic RGB-D image synthesis to address this problem. It requires synthesising a realistic-looking RGB-D image for a given semantic label map. Current approaches, however, are uni-modal and cannot cope with multi-modal data. Indeed, we show that extending uni-modal approaches to multi-modal data does not perform well. In this paper, we therefore propose a generator for multi-modal data that separates modal-independent information of the semantic layout…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
