Bridging Composite and Real: Towards End-to-end Deep Image Matting
Jizhizi Li, Jing Zhang, Stephen J. Maybank, Dacheng Tao

TL;DR
This paper introduces a novel end-to-end deep image matting network that separates semantic and detail processing, improving accuracy and generalization on real-world images, especially for complex foregrounds like animals and portraits.
Contribution
The paper proposes the Glance and Focus Matting (GFM) network with a shared encoder and dual decoders, and a composition route RSSN to bridge the domain gap between composite and real images.
Findings
GFM outperforms state-of-the-art methods in accuracy.
The domain gap analysis improves generalization to real-world images.
A new benchmark with 2,000 animal and 10,000 portrait images is provided.
Abstract
Extracting accurate foregrounds from natural images benefits many downstream applications such as film production and augmented reality. However, the furry characteristics and various appearance of the foregrounds, e.g., animal and portrait, challenge existing matting methods, which usually require extra user inputs such as trimap or scribbles. To resolve these problems, we study the distinct roles of semantics and details for image matting and decompose the task into two parallel sub-tasks: high-level semantic segmentation and low-level details matting. Specifically, we propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders to learn both tasks in a collaborative manner for end-to-end natural image matting. Besides, due to the limitation of available natural images in the matting task, previous methods typically adopt composite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Visual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis
