DistilCamemBERT: a distillation of the French model CamemBERT
Cyrile Delestre, Abibatou Amar

TL;DR
This paper introduces DistilCamemBERT, a smaller, more efficient version of the French NLP model CamemBERT, designed to facilitate industrial deployment while maintaining strong performance.
Contribution
The paper presents a distillation process that significantly reduces the size and computational requirements of CamemBERT without sacrificing accuracy.
Findings
Reduced model size and inference time
Maintained competitive performance on NLP tasks
Facilitates industrial deployment of French NLP models
Abstract
Modern Natural Language Processing (NLP) models based on Transformer structures represent the state of the art in terms of performance on very diverse tasks. However, these models are complex and represent several hundred million parameters for the smallest of them. This may hinder their adoption at the industrial level, making it difficult to scale up to a reasonable infrastructure and/or to comply with societal and environmental responsibilities. To this end, we present in this paper a model that drastically reduces the computational cost of a well-known French model (CamemBERT), while preserving good performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Label Smoothing
