When Style Similarity Scores Fail: Diagnosing Raw CSD Cosine in Artist-Style Evaluation
J\"org Frochte

TL;DR
This paper introduces the discrimination gap diagnostic to evaluate the reliability of raw cosine similarity scores in artist-style evaluation, revealing limitations and proposing corrections for more accurate style-fidelity assessment.
Contribution
It presents a diagnostic method to identify when raw CSD cosine scores are unreliable and demonstrates improved evaluation accuracy using CSLS readout and positional interpolation.
Findings
Raw CSD cosine scores often fail to reliably distinguish artists.
CSLS readout reduces false positives in style verification.
Diagnostic protocol improves the reliability of style similarity scores.
Abstract
Raw cosine in the 768-dimensional output space of the Contrastive Style Descriptor (CSD) is now widely read as an absolute, calibrated style-fidelity score for text-to-image and style-imitation evaluation. We introduce the discrimination gap, a corpus-internal, prototype-free and threshold-free diagnostic that tests whether contrastive style cosines admit an absolute same-versus-different interpretation on a candidate artist corpus. On a 1799-artwork, 91-artist public-domain corpus, raw CSD cosine yields negative point-estimate gaps for artists at the pairwise level ( robust under bootstrap) and for in the aggregated-pool scoring regime style-fidelity evaluations typically use. CSLS readout on the frozen backbone reduces the aggregated negative-gap count to ; combined with positional-embedding interpolation to pixels it raises unsupervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
