SLANT: Spurious Logo ANalysis Toolkit
Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer

TL;DR
SLANT is a toolkit that identifies and analyzes spurious correlations between logos and model predictions in vision-language models, revealing vulnerabilities and suggesting mitigation strategies.
Contribution
We introduce SLANT, a semi-automatic toolkit for mining logos that cause spurious model correlations, highlighting new risks and defenses in vision-language models.
Findings
Logos can cause models to misclassify content as harmless or harmful.
Certain logos are correlated with negative adjectives and concepts.
Logos can be exploited as simple attacks against foundation models.
Abstract
Online content is filled with logos, from ads and social media posts to website branding and product placements. Consequently, these logos are prevalent in the extensive web-scraped datasets used to pretrain Vision-Language Models, which are used for a wide array of tasks (content moderation, object classification). While these models have been shown to learn harmful correlations in various tasks, whether these correlations include logos remains understudied. Understanding this is especially important due to logos often being used by public-facing entities like brands and government agencies. To that end, we develop SLANT: A Spurious Logo ANalysis Toolkit. Our key finding is that some logos indeed lead to spurious incorrect predictions, for example, adding the Adidas logo to a photo of a person causes a model classify the person as greedy. SLANT contains a semi-automatic mechanism for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Misinformation and Its Impacts
