Virtual Multi-Modality Self-Supervised Foreground Matting for Human-Object Interaction
Bo Xu, Han Huang, Cheng Lu, Ziwen Li, Yandong Guo

TL;DR
This paper introduces a self-supervised multi-modality approach for human-object interaction foreground matting from RGB images, eliminating the need for additional inputs like trimaps or backgrounds.
Contribution
It reformulates foreground matting as a self-supervised multi-modality problem and proposes a novel Complementary Learning method to improve accuracy without labeled data.
Findings
Outperforms state-of-the-art methods in human-object foreground matting.
Effectively utilizes depth, segmentation, and interaction heatmap modalities.
Self-supervised learning reduces reliance on labeled training data.
Abstract
Most existing human matting algorithms tried to separate pure human-only foreground from the background. In this paper, we propose a Virtual Multi-modality Foreground Matting (VMFM) method to learn human-object interactive foreground (human and objects interacted with him or her) from a raw RGB image. The VMFM method requires no additional inputs, e.g. trimap or known background. We reformulate foreground matting as a self-supervised multi-modality problem: factor each input image into estimated depth map, segmentation mask, and interaction heatmap using three auto-encoders. In order to fully utilize the characteristics of each modality, we first train a dual encoder-to-decoder network to estimate the same alpha matte. Then we introduce a self-supervised method: Complementary Learning(CL) to predict deviation probability map and exchange reliable gradients across modalities without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Advanced Vision and Imaging · Visual Attention and Saliency Detection
MethodsHeatmap
