O_FT@EvalLLM2025 : \'etude comparative de choix de donn\'ees et de strat\'egies d'apprentissage pour l'adaptation de mod\`eles de langue \`a un domaine

Isma\"el Rousseau; Claire Perroux; Pierre Adam; Thomas Girault; Lionel Delphin-Poulat; Morgan Veyret; Gw\'enol\'e Lecorv\'e; G\'eraldine Damnati

arXiv:2507.04895·cs.CL·July 8, 2025

O_FT@EvalLLM2025 : \'etude comparative de choix de donn\'ees et de strat\'egies d'apprentissage pour l'adaptation de mod\`eles de langue \`a un domaine

Isma\"el Rousseau, Claire Perroux, Pierre Adam, Thomas Girault, Lionel Delphin-Poulat, Morgan Veyret, Gw\'enol\'e Lecorv\'e, G\'eraldine Damnati

PDF

TL;DR

This study explores domain-specific adaptation of a language model to the defense sector using classical fine-tuning techniques, demonstrating improved domain knowledge and task performance with a focus on sustainability.

Contribution

It presents a comprehensive data collection, generation, and selection process for domain adaptation of a relatively small language model, showing effective results in the defense domain.

Findings

01

Enhanced domain-specific knowledge and task handling in the adapted models.

02

Comparable or superior performance on general knowledge tasks.

03

Feasibility of domain adaptation for small models considering carbon footprint.

Abstract

This paper presents the work carried out by the O_FT team, joint with Orange and Ouest-France, on adapting language models to the defense domain as part of the EvalLLM2025 challenge. This work focused on adapting the \texttt{Mistral-7B-Instruct-v0.3} model using classical techniques of continued pre-training and instruction-tuning. The core of our efforts is based on collecting, generating, and selecting data for these two stages as well as for model evaluation. Experiments show that our adapted models have better domain-specific knowledge and improved domain-specific task processing skills, along with comparable (or even superior) performance on general knowledge and skills. Considering the carbon footprint of our adaptations, this work demonstrates the feasibility of domain adaptation for relatively small models. -- Ce document pr\'esente les travaux r\'ealis\'es par l'\'equipe…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.