RetCompletion:High-Speed Inference Image Completion with Retentive   Network

Yueyang Cang; Pingge Hu; Xiaoteng Zhang; Xingtong Wang; Yuhang Liu; Li; Shi

arXiv:2410.04056·cs.CV·December 5, 2024

RetCompletion:High-Speed Inference Image Completion with Retentive Network

Yueyang Cang, Pingge Hu, Xiaoteng Zhang, Xingtong Wang, Yuhang Liu, Li, Shi

PDF

Open Access

TL;DR

RetCompletion leverages RetNet-inspired architecture for fast, high-quality pluralistic image completion, significantly reducing inference time while maintaining strong reconstruction performance.

Contribution

This paper introduces RetCompletion, a novel two-stage framework applying RetNet to image completion, achieving high speed and quality improvements over existing methods.

Findings

01

Inference speed is 10x faster than ICT.

02

Inference speed is 15x faster than RePaint.

03

RetCompletion achieves high-quality image reconstruction.

Abstract

Time cost is a major challenge in achieving high-quality pluralistic image completion. Recently, the Retentive Network (RetNet) in natural language processing offers a novel approach to this problem with its low-cost inference capabilities. Inspired by this, we apply RetNet to the pluralistic image completion task in computer vision. We present RetCompletion, a two-stage framework. In the first stage, we introduce Bi-RetNet, a bidirectional sequence information fusion model that integrates contextual information from images. During inference, we employ a unidirectional pixel-wise update strategy to restore consistent image structures, achieving both high reconstruction quality and fast inference speed. In the second stage, we use a CNN for low-resolution upsampling to enhance texture details. Experiments on ImageNet and CelebA-HQ demonstrate that our inference speed is 10 $\times$ faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · AI in cancer detection · Image Retrieval and Classification Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings