RuCLIP -- new models and experiments: a technical report
Alex Shonenkov, Andrey Kuznetsov, Denis Dimitrov, Tatyana Shavrina,, Daniil Chesakov, Anastasia Maltseva, Alena Fenogenova, Igor Pavlov, Anton, Emelyanov, Sergey Markov, Daria Bakshandaeva, Vera Shybaeva, Andrey Chertok

TL;DR
This technical report introduces six new ruCLIP models trained on 240 million pairs, demonstrating improved accuracy over CLIP with OPUS-MT translation across multiple datasets and tasks, with analysis of implementation details and inference times.
Contribution
The report presents six novel ruCLIP model implementations trained on large-scale data, outperforming existing CLIP + OPUS-MT solutions in various few-shot and zero-shot tasks.
Findings
Best models outperform CLIP + OPUS-MT on most datasets
Models excel in few-shot and zero-shot tasks
Inference time analysis provided
Abstract
In the report we propose six new implementations of ruCLIP model trained on our 240M pairs. The accuracy results are compared with original CLIP model with Ru-En translation (OPUS-MT) on 16 datasets from different domains. Our best implementations outperform CLIP + OPUS-MT solution on most of the datasets in few-show and zero-shot tasks. In the report we briefly describe the implementations and concentrate on the conducted experiments. Inference execution time comparison is also presented in the report.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsContrastive Language-Image Pre-training
