Learning Hierarchical Semantic Image Manipulation through Structured Representations
Seunghoon Hong, Xinchen Yan, Thomas Huang, Honglak Lee

TL;DR
This paper introduces a hierarchical semantic image manipulation framework that uses structured semantic layouts for more precise and flexible image editing, outperforming previous methods in quality and controllability.
Contribution
The work presents a novel hierarchical approach employing structured semantic layouts for object-level image manipulation, enabling more accurate and flexible editing capabilities.
Findings
Outperforms existing models in qualitative and quantitative evaluations.
Enables object-level manipulation such as adding, removing, and moving objects.
Demonstrates applications in interactive editing and data-driven image manipulation.
Abstract
Understanding, reasoning, and manipulating semantic concepts of images have been a fundamental research problem for decades. Previous work mainly focused on direct manipulation on natural image manifold through color strokes, key-points, textures, and holes-to-fill. In this work, we present a novel hierarchical framework for semantic image manipulation. Key to our hierarchical framework is that we employ a structured semantic layout as our intermediate representation for manipulation. Initialized with coarse-level bounding boxes, our structure generator first creates pixel-wise semantic layout capturing the object shape, object-object interactions, and object-scene relations. Then our image generator fills in the pixel-level textures guided by the semantic layout. Such framework allows a user to manipulate images at object-level by adding, removing, and moving one bounding box at a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Vision and Imaging
