Test-Time Canonicalization by Foundation Models for Robust Perception

Utkarsh Singhal; Ryan Feng; Stella X. Yu; Atul Prakash

arXiv:2507.10375·cs.CV·September 17, 2025

Test-Time Canonicalization by Foundation Models for Robust Perception

Utkarsh Singhal, Ryan Feng, Stella X. Yu, Atul Prakash

PDF

Open Access 1 Repo 1 Video

TL;DR

FOCAL is a test-time framework that enhances perception robustness by transforming inputs into typical views using foundation models, without retraining or architectural changes, effectively handling various transformations.

Contribution

Introduces FOCAL, a novel test-time optimization method inspired by mental rotation, improving robustness across diverse viewing conditions without retraining.

Findings

01

Significantly improves robustness of models like CLIP and SAM.

02

Effective against 2D/3D rotations, lighting, and day-night shifts.

03

No retraining or architectural modifications needed.

Abstract

Perception in the real world requires robustness to diverse viewing conditions. Existing approaches often rely on specialized architectures or training with predefined data augmentations, limiting adaptability. Taking inspiration from mental rotation in human vision, we propose FOCAL, a test-time robustness framework that transforms the input into the most typical view. At inference time, FOCAL explores a set of transformed images and chooses the one with the highest likelihood under foundation model priors. This test-time optimization boosts robustness while requiring no retraining or architectural changes. Applied to models like CLIP and SAM, it significantly boosts robustness across a wide range of transformations, including 2D and 3D rotations, contrast and lighting shifts, and day-night changes. We also explore potential applications in active vision. By reframing invariance as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sutkarsh/focal
pytorchOfficial

Videos

Test-Time Canonicalization by Foundation Models for Robust Perception· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Industrial Vision Systems and Defect Detection · Image Processing Techniques and Applications

MethodsSegment Anything Model · Contrastive Language-Image Pre-training