LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT   Descriptors

Saksham Suri; Matthew Walmer; Kamal Gupta; Abhinav Shrivastava

arXiv:2403.14625·cs.CV·October 30, 2024·1 cites

LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors

Saksham Suri, Matthew Walmer, Kamal Gupta, Abhinav Shrivastava

PDF

Open Access 1 Repo

TL;DR

LiFT is a simple, fast, self-supervised postprocessing method that enhances dense features of pre-trained ViT models, improving performance on various downstream tasks with minimal additional inference cost.

Contribution

Introducing LiFT, a lightweight, self-supervised feature transform that boosts dense ViT features for downstream tasks without complex training or significant computational overhead.

Findings

01

LiFT improves keypoint correspondence, detection, and segmentation performance.

02

LiFT enhances scale invariance and object boundary detection.

03

LiFT can be integrated with existing downstream modules like ViTDet.

Abstract

We present a simple self-supervised method to enhance the performance of ViT features for dense downstream tasks. Our Lightweight Feature Transform (LiFT) is a straightforward and compact postprocessing network that can be applied to enhance the features of any pre-trained ViT backbone. LiFT is fast and easy to train with a self-supervised objective, and it boosts the density of ViT features for minimal extra inference cost. Furthermore, we demonstrate that LiFT can be applied with approaches that use additional task-specific downstream modules, as we integrate LiFT with ViTDet for COCO detection and segmentation. Despite the simplicity of LiFT, we find that it is not simply learning a more complex version of bilinear interpolation. Instead, our LiFT training protocol leads to several desirable emergent properties that benefit ViT features in dense downstream tasks. This includes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

havrylovv/isegprobe
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Video Analysis and Summarization