BERTino: an Italian DistilBERT model
Matteo Muffo, Enrico Bertino

TL;DR
BERTino is a lightweight Italian language model based on DistilBERT, offering comparable performance to BERTBASE with faster training and inference, addressing the high computational demands of large NLP models.
Contribution
This work introduces BERTino, the first Italian-specific lightweight DistilBERT model, improving efficiency while maintaining high performance.
Findings
BERTino achieves F1 scores comparable to BERTBASE.
BERTino significantly improves training and inference speed.
BERTino performs well on multiple Italian NLP tasks.
Abstract
The recent introduction of Transformers language representation models allowed great improvements in many natural language processing (NLP) tasks. However, if on one hand the performances achieved by this kind of architectures are surprising, on the other their usability is limited by the high number of parameters which constitute their network, resulting in high computational and memory demands. In this work we present BERTino, a DistilBERT model which proposes to be the first lightweight alternative to the BERT architecture specific for the Italian language. We evaluated BERTino on the Italian ISDT, Italian ParTUT, Italian WikiNER and multiclass classification tasks, obtaining F1 scores comparable to those obtained by a BERTBASE with a remarkable improvement in training and inference speed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay · DistilBERT · Dropout · Dense Connections · Weight Decay
