PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling
Zhong-Yu Li, Yunheng Li, Deng-Ping Fan, Ming-Ming Cheng

TL;DR
This paper introduces PR-MIM, a novel partial reconstruction method for masked image modeling that maintains high performance while significantly reducing computational costs by reconstructing thrown tokens in a lightweight manner.
Contribution
It proposes a progressive and furthest sampling reconstruction strategy that improves partial reconstruction efficiency without sacrificing representation quality.
Findings
Achieves lossless performance with 50% patches thrown
Saves 28% FLOPs and 36% memory compared to standard MAE
Validates effectiveness across various frameworks
Abstract
Masked image modeling has achieved great success in learning representations but is limited by the huge computational costs. One cost-saving strategy makes the decoder reconstruct only a subset of masked tokens and throw the others, and we refer to this method as partial reconstruction. However, it also degrades the representation quality. Previous methods mitigate this issue by throwing tokens with minimal information using temporal redundancy inaccessible for static images or attention maps that incur extra costs and complexity. To address these limitations, we propose a progressive reconstruction strategy and a furthest sampling strategy to reconstruct those thrown tokens in an extremely lightweight way instead of completely abandoning them. This approach involves all masked tokens in supervision to ensure adequate pre-training, while maintaining the cost-reduction benefits of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis
MethodsSoftmax · Attention Is All You Need · Masked autoencoder
