Contrastive Language-Image Pre-training for the Italian Language
Federico Bianchi, Giuseppe Attanasio, Raphael Pisoni, Silvia Terragni,, Gabriele Sarti, Sri Lakshmi

TL;DR
This paper introduces CLIP-Italian, a contrastive learning model for Italian that leverages 1.4 million image-text pairs, outperforming multilingual CLIP in image retrieval and zero-shot classification tasks.
Contribution
First CLIP model tailored for Italian, trained on a large dataset, demonstrating superior performance over multilingual models in specific tasks.
Findings
CLIP-Italian outperforms multilingual CLIP in image retrieval.
CLIP-Italian achieves better zero-shot classification results.
The model is trained on 1.4 million image-text pairs.
Abstract
CLIP (Contrastive Language-Image Pre-training) is a very recent multi-modal model that jointly learns representations of images and texts. The model is trained on a massive amount of English data and shows impressive performance on zero-shot classification tasks. Training the same model on a different language is not trivial, since data in other languages might be not enough and the model needs high-quality translations of the texts to guarantee a good performance. In this paper, we present the first CLIP model for the Italian Language (CLIP-Italian), trained on more than 1.4 million image-text pairs. Results show that CLIP-Italian outperforms the multilingual CLIP model on the tasks of image retrieval and zero-shot classification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistic Studies and Language Acquisition · Natural Language Processing Techniques · Second Language Learning and Teaching
MethodsLinear Layer · Contrastive Language-Image Pre-training · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections
