A Fusion Model for Artwork Identification Based on Convolutional Neural Networks and Transformers
Zhenyu Wang, Heng Song

TL;DR
This paper introduces a fusion model combining CNNs and Transformers to improve artwork identification accuracy by leveraging local and global features, outperforming individual models on Chinese and oil painting datasets.
Contribution
The paper presents a novel fusion approach that integrates CNNs and Transformers for enhanced artwork classification performance.
Findings
Fusion model outperforms individual CNN and Transformer models.
Classification accuracy improved by approximately 9.7% and 7.1%.
F1 scores increased by 0.06 and 0.05.
Abstract
The identification of artwork is crucial in areas like cultural heritage protection, art market analysis, and historical research. With the advancement of deep learning, Convolutional Neural Networks (CNNs) and Transformer models have become key tools for image classification. While CNNs excel in local feature extraction, they struggle with global context, and Transformers are strong in capturing global dependencies but weak in fine-grained local details. To address these challenges, this paper proposes a fusion model combining CNNs and Transformers for artwork identification. The model first extracts local features using CNNs, then captures global context with a Transformer, followed by a feature fusion mechanism to enhance classification accuracy. Experiments on Chinese and oil painting datasets show the fusion model outperforms individual CNN and Transformer models, improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAesthetic Perception and Analysis · Generative Adversarial Networks and Image Synthesis · Art History and Market Analysis
MethodsAttention Is All You Need · Absolute Position Encodings · Dense Connections · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer
