WP-CLIP: Leveraging CLIP to Predict W\"olfflin's Principles in Visual Art
Abhijay Ghildyal, Li-Yun Wang, Feng Liu

TL;DR
This paper explores using CLIP, a vision-language model, to predict W"olfflin's stylistic principles in visual art, involving fine-tuning on annotated datasets to improve accuracy and generalization across diverse artworks.
Contribution
The study introduces WP-CLIP, a fine-tuned CLIP model capable of predicting W"olfflin's principles, advancing automated stylistic analysis in visual art.
Findings
WP-CLIP can predict stylistic principles with reasonable accuracy.
The model generalizes well across different art styles and datasets.
Fine-tuning improves CLIP's ability to interpret nuanced artistic features.
Abstract
W\"olfflin's five principles offer a structured approach to analyzing stylistic variations for formal analysis. However, no existing metric effectively predicts all five principles in visual art. Computationally evaluating the visual aspects of a painting requires a metric that can interpret key elements such as color, composition, and thematic choices. Recent advancements in vision-language models (VLMs) have demonstrated their ability to evaluate abstract image attributes, making them promising candidates for this task. In this work, we investigate whether CLIP, pre-trained on large-scale data, can understand and predict W\"olfflin's principles. Our findings indicate that it does not inherently capture such nuanced stylistic elements. To address this, we fine-tune CLIP on annotated datasets of real art images to predict a score for each principle. We evaluate our model, WP-CLIP, on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAesthetic Perception and Analysis
