PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image   Modeling

Zhong-Yu Li; Yunheng Li; Deng-Ping Fan; Ming-Ming Cheng

arXiv:2411.15746·cs.CV·November 26, 2024

PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling

Zhong-Yu Li, Yunheng Li, Deng-Ping Fan, Ming-Ming Cheng

PDF

Open Access

TL;DR

This paper introduces PR-MIM, a novel partial reconstruction method for masked image modeling that maintains high performance while significantly reducing computational costs by reconstructing thrown tokens in a lightweight manner.

Contribution

It proposes a progressive and furthest sampling reconstruction strategy that improves partial reconstruction efficiency without sacrificing representation quality.

Findings

01

Achieves lossless performance with 50% patches thrown

02

Saves 28% FLOPs and 36% memory compared to standard MAE

03

Validates effectiveness across various frameworks

Abstract

Masked image modeling has achieved great success in learning representations but is limited by the huge computational costs. One cost-saving strategy makes the decoder reconstruct only a subset of masked tokens and throw the others, and we refer to this method as partial reconstruction. However, it also degrades the representation quality. Previous methods mitigate this issue by throwing tokens with minimal information using temporal redundancy inaccessible for static images or attention maps that incur extra costs and complexity. To address these limitations, we propose a progressive reconstruction strategy and a furthest sampling strategy to reconstruct those thrown tokens in an extremely lightweight way instead of completely abandoning them. This approach involves all masked tokens in supervision to ensure adequate pre-training, while maintaining the cost-reduction benefits of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Numerical Analysis Techniques · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis

MethodsSoftmax · Attention Is All You Need · Masked autoencoder