Multimodal Semantic Transfer from Text to Image. Fine-Grained Image Classification by Distributional Semantics
Simon Donig, Maria Christoforaki, Bernhard Bermeitinger, Siegfried, Handschuh

TL;DR
This paper introduces a multimodal approach that transfers semantic information from text to image classification, using distributional semantics to enhance fine-grained image categorization in digital humanities.
Contribution
It presents a novel method combining CNNs with distributional semantic vectors derived from domain-specific texts for improved image classification.
Findings
Enhanced semantic understanding in image classification
Better handling of small and high-dimensional data
Improved accuracy in fine-grained categories
Abstract
In the last years, image classification processes like neural networks in the area of art-history and Heritage Informatics have experienced a broad distribution (Lang and Ommer 2018). These methods face several challenges, including the handling of comparatively small amounts of data as well as high-dimensional data in the Digital Humanities. Here, a Convolutional Neural Network (CNN) is used that output is not, as usual, a series of flat text labels but a series of semantically loaded vectors. These vectors result from a Distributional Semantic Model (DSM) which is generated from an in-domain text corpus. ----- In den letzten Jahren hat die Verwendung von Bildklassifizierungsverfahren wie neuronalen Netzwerken auch im Bereich der historischen Bildwissenschaften und der Heritage Informatics weite Verbreitung gefunden (Lang und Ommer 2018). Diese Verfahren stehen dabei vor einer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Topic Modeling · Music and Audio Processing
