Keypoint Aware Masked Image Modelling

Madhava Krishna; A V Subramanyam

arXiv:2407.13873·cs.CV·January 3, 2025

Keypoint Aware Masked Image Modelling

Madhava Krishna, A V Subramanyam

PDF

Open Access 1 Repo

TL;DR

KAMIM enhances masked image modeling by incorporating keypoint-based patch weighting, significantly improving linear probing accuracy and providing more effective local context understanding in vision transformers.

Contribution

The paper introduces KAMIM, a novel patch-wise weighting method based on keypoint features that improves masked image modeling performance, especially for linear probing tasks.

Findings

01

Linear probing accuracy increased from 16.12% to 33.97%.

02

Finetuning accuracy improved slightly from 76.78% to 77.3%.

03

Patch-wise weighting benefits larger pretraining datasets.

Abstract

SimMIM is a widely used method for pretraining vision transformers using masked image modeling. However, despite its success in fine-tuning performance, it has been shown to perform sub-optimally when used for linear probing. We propose an efficient patch-wise weighting derived from keypoint features which captures the local information and provides better context during SimMIM's reconstruction phase. Our method, KAMIM, improves the top-1 linear probing accuracy from 16.12% to 33.97%, and finetuning accuracy from 76.78% to 77.3% when tested on the ImageNet-1K dataset with a ViT-B when trained for the same number of epochs. We conduct extensive testing on different datasets, keypoint extractors, and model architectures and observe that patch-wise weighting augments linear probing performance for larger pretraining datasets. We also analyze the learned representations of a ViT-B trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

madhava20217/kamim
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques

MethodsSoftmax · Attention Is All You Need · Contrastive Learning