LipSim: A Provably Robust Perceptual Similarity Metric
Sara Ghazanfari, Alexandre Araujo, Prashanth Krishnamurthy, Farshad, Khorrami, Siddharth Garg

TL;DR
This paper introduces LipSim, a new perceptual similarity metric that is provably robust against adversarial attacks by using Lipschitz neural networks, improving reliability in visual similarity assessments.
Contribution
The paper proposes LipSim, a novel perceptual similarity metric based on 1-Lipschitz neural networks, with provable robustness guarantees against adversarial perturbations.
Findings
LipSim outperforms existing metrics in natural and certified scores.
LipSim provides robustness certificates for all perturbations within an $ ext{l}_2$ ball.
Experimental results demonstrate LipSim's effectiveness in image retrieval tasks.
Abstract
Recent years have seen growing interest in developing and applying perceptual similarity metrics. Research has shown the superiority of perceptual metrics over pixel-wise metrics in aligning with human perception and serving as a proxy for the human visual system. On the other hand, as perceptual metrics rely on neural networks, there is a growing concern regarding their resilience, given the established vulnerability of neural networks to adversarial attacks. It is indeed logical to infer that perceptual metrics may inherit both the strengths and shortcomings of neural networks. In this work, we demonstrate the vulnerability of state-of-the-art perceptual similarity metrics based on an ensemble of ViT-based feature extractors to adversarial attacks. We then propose a framework to train a robust perceptual similarity metric called LipSim (Lipschitz Similarity Metric) with provable…
Peer Reviews
Decision·ICLR 2024 poster
The experimental results looks very good when against auto attacks. The proof seems to make sense but I am not an export with this.
The presentation could be better. For example, the explanation of 2AFC is little which making it difficult to get the messages from this paper. The experiments only conducted with Auto Attack. However, there are different kinds of attacks, and it would be good to experiment with other attack methods as well. It would also be good to compare with other certified or non-certified defense methods.
- LipSim employs a 1-Lipschitz network backbone, which when combined with certain design choices, greatly enhances its resistance to adversarial perturbations. - When applied to the real-world task of image retrieval, LipSim effectively identified semantically close images, even when faced with adversarial image queries. - LipSim excels both in empirical and certified robustness tests. This dual proficiency ensures that the metric's performance is not only observed in experimental conditions bu
- The natural score of LipSim was observed to be lower than that of some competitors like DreamSim. This might raise concerns about its general performance when not under adversarial conditions. - The real-world application testing of LipSim was primarily on image retrieval. It would be beneficial to see its performance on a wider variety of tasks.
The presented method is first in its kind to provide a provably robust perceptual metric. Authors have shown that this method is more robust than existing perceptual metrics (DreamSim). Several experiments are conducted to show the efficacy of the method. The paper in general is well organized and well-written. There are also several experiments that show effectiveness of the method.
Although there has been some previous works on robust perceptual metrics, authors claim theirs is the first one with provable guarantees. I still think discussing why this matters in practice is important. I see the application for image retrieval, but how can someone have access to the model (white box) to actually attack it. Please elaborate. Consider changing the colors in bar plot of Fig. 3.a. They are hard to distinguish.
Code & Models
Videos
Taxonomy
TopicsVisual Attention and Saliency Detection · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
MethodsSparse Evolutionary Training
