CSF-Net: Context-Semantic Fusion Network for Large Mask Inpainting
Chae-Yeon Heo, Yeong-Jun Cho

TL;DR
CSF-Net is a transformer-based framework that uses semantic priors from a pretrained amodal completion model to improve large-mask image inpainting, enhancing structural accuracy and semantic consistency.
Contribution
It introduces a novel semantic-guided fusion network that integrates structure-aware candidates with contextual features for improved inpainting quality.
Findings
Reduces object hallucination in inpainting results.
Enhances visual realism and semantic alignment.
Consistently improves performance across diverse masking conditions.
Abstract
In this paper, we propose a semantic-guided framework to address the challenging problem of large-mask image inpainting, where essential visual content is missing and contextual cues are limited. To compensate for the limited context, we leverage a pretrained Amodal Completion (AC) model to generate structure-aware candidates that serve as semantic priors for the missing regions. We introduce Context-Semantic Fusion Network (CSF-Net), a transformer-based fusion framework that fuses these candidates with contextual features to produce a semantic guidance image for image inpainting. This guidance improves inpainting quality by promoting structural accuracy and semantic consistency. CSF-Net can be seamlessly integrated into existing inpainting models without architectural changes and consistently enhances performance across diverse masking conditions. Extensive experiments on the Places365…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Advanced Image Fusion Techniques
