Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

Md Ashiqur Rahman; Chiao-An Yang; Michael N. Cheng; Lim Jun Hao; Jeremiah Jiang; Teck-Yian Lim; Raymond A. Yeh

arXiv:2508.14187·cs.CV·August 21, 2025

Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

Md Ashiqur Rahman, Chiao-An Yang, Michael N. Cheng, Lim Jun Hao, Jeremiah Jiang, Teck-Yian Lim, Raymond A. Yeh

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper introduces a deep equilibrium canonicalizer (DEC) that enhances local scale equivariance in models, improving performance and consistency across various architectures on ImageNet.

Contribution

The paper proposes DEC, a novel method to improve local scale equivariance, which can be integrated into existing models and pre-trained networks.

Findings

01

DEC improves model performance on ImageNet.

02

DEC enhances local scale consistency across different architectures.

03

Applicable to pre-trained models like ViT, DeiT, Swin, and BEiT.

Abstract

Scale variation is a fundamental challenge in computer vision. Objects of the same class can have different sizes, and their perceived size is further affected by the distance from the camera. These variations are local to the objects, i.e., different object sizes may change differently within the same image. To effectively handle scale variations, we present a deep equilibrium canonicalizer (DEC) to improve the local scale equivariance of a model. DEC can be easily incorporated into existing network architectures and can be adapted to a pre-trained model. Notably, we show that on the competitive ImageNet benchmark, DEC improves both model performance and local scale consistency across four popular pre-trained deep-nets, e.g., ViT, DeiT, Swin, and BEiT. Our code is available at https://github.com/ashiq24/local-scale-equivariance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ashiq24/dinov2-base-lse
model· 31 dl
31 dl

Datasets

ashiq24/Multi_Scale_ImageNet
dataset· 22k dl
22k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Image Retrieval and Classification Techniques · Generative Adversarial Networks and Image Synthesis