The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Manifold Alignment for Robust Finetuning of Vision-Language Models

Shivang Chopra; Shaunak Halbe; Chengyue Huang; Brisa Maneechotesuwan; Zsolt Kira

arXiv:2603.27139·cs.CV·April 7, 2026

The Geometry of Robustness: Optimizing Loss Landscape Curvature and Feature Manifold Alignment for Robust Finetuning of Vision-Language Models

Shivang Chopra, Shaunak Halbe, Chengyue Huang, Brisa Maneechotesuwan, Zsolt Kira

PDF

TL;DR

GRACE is a unified fine-tuning framework for vision-language models that improves robustness and accuracy by regularizing curvature and feature invariance, addressing key geometric failures.

Contribution

It introduces GRACE, a novel method combining curvature regularization and feature alignment, grounded in Robust PAC-Bayes theory, to enhance robustness and accuracy in VLM fine-tuning.

Findings

01

Improves ID accuracy by 10.8% on ImageNet.

02

Enhances adversarial robustness by 13.5%.

03

Maintains OOD accuracy comparable to zero-shot baseline.

Abstract

Fine-tuning approaches for Vision-Language Models (VLMs) face a critical three-way trade-off between In-Distribution (ID) accuracy, Out-of-Distribution (OOD) generalization, and adversarial robustness. Existing robust fine-tuning strategies resolve at most two axes of this trade-off. Generalization-preserving methods retain ID/OOD performance but leave models vulnerable to adversarial attacks, while adversarial training improves robustness to targeted attacks but degrades ID/OOD accuracy. Our key insight is that the robustness trade-off stems from two geometric failures: sharp, anisotropic minima in parameter space and unstable feature representations that deform under perturbation. To address this, we propose GRACE (Gram-aligned Robustness via Adaptive Curvature Estimation), a unified fine-tuning framework that jointly regularizes the parameter-space curvature and feature-space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.