RetCompletion:High-Speed Inference Image Completion with Retentive Network
Yueyang Cang, Pingge Hu, Xiaoteng Zhang, Xingtong Wang, Yuhang Liu, Li, Shi

TL;DR
RetCompletion leverages RetNet-inspired architecture for fast, high-quality pluralistic image completion, significantly reducing inference time while maintaining strong reconstruction performance.
Contribution
This paper introduces RetCompletion, a novel two-stage framework applying RetNet to image completion, achieving high speed and quality improvements over existing methods.
Findings
Inference speed is 10x faster than ICT.
Inference speed is 15x faster than RePaint.
RetCompletion achieves high-quality image reconstruction.
Abstract
Time cost is a major challenge in achieving high-quality pluralistic image completion. Recently, the Retentive Network (RetNet) in natural language processing offers a novel approach to this problem with its low-cost inference capabilities. Inspired by this, we apply RetNet to the pluralistic image completion task in computer vision. We present RetCompletion, a two-stage framework. In the first stage, we introduce Bi-RetNet, a bidirectional sequence information fusion model that integrates contextual information from images. During inference, we employ a unidirectional pixel-wise update strategy to restore consistent image structures, achieving both high reconstruction quality and fast inference speed. In the second stage, we use a CNN for low-resolution upsampling to enhance texture details. Experiments on ImageNet and CelebA-HQ demonstrate that our inference speed is 10 faster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · AI in cancer detection · Image Retrieval and Classification Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
