TL;DR
This paper presents a multimodal classification system for cultural heritage artifacts, combining image, text, and tabular data using deep learning and knowledge graphs, achieving high accuracy in property prediction.
Contribution
It introduces a novel multimodal classifier with a late fusion approach and a new dataset leveraging knowledge graphs for cultural heritage artifacts.
Findings
Multimodal approach outperforms individual classifiers.
High accuracy in predicting missing artifact properties.
Effective integration of deep learning and knowledge graphs.
Abstract
We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers and use the focal loss to handle class imbalance. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged specific data models and taxonomy in a Knowledge Graph to create the dataset and to store classification results. All individual classifiers accurately predict missing properties in the digitized silk artifacts, with the multimodal approach providing the best results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · Kaiming Initialization · Max Pooling · Average Pooling · Global Average Pooling · Focal Loss
