EngGPT2: Sovereign, Efficient and Open Intelligence

G. Ciarfaglia; A. Rosanova; S. Cipolla; J. Bartoli; A. Di Domenico; C. Fioroni; A. Fontana; M. R. Scoleri; M. I. Mone; D. Franchi; M. C. Del Gaudio; A. Leodori; F. Cinti; M. Capozzi; C. Baston; F. Picariello; M. Gabusi; S. Bonura; V. Morreale; I. Bailo

arXiv:2603.16430·cs.CL·March 31, 2026

EngGPT2: Sovereign, Efficient and Open Intelligence

G. Ciarfaglia, A. Rosanova, S. Cipolla, J. Bartoli, A. Di Domenico, C. Fioroni, A. Fontana, M. R. Scoleri, M. I. Mone, D. Franchi, M. C. Del Gaudio, A. Leodori, F. Cinti, M. Capozzi, C. Baston, F. Picariello, M. Gabusi, S. Bonura, V. Morreale, I. Bailo

PDF

1 Models

TL;DR

EngGPT2-16B-A3B is a resource-efficient, open, and European-focused Mixture-of-Experts language model trained on 2.5 trillion tokens, achieving competitive performance with reduced training and inference costs.

Contribution

This paper introduces EngGPT2, a novel 16-billion-parameter MoE model optimized for efficiency, multilingual reasoning, and European NLP tasks, with a focus on open and sovereign AI development.

Findings

01

Achieves performance comparable to larger models on key benchmarks.

02

Requires significantly less training data and inference power.

03

Supports multiple reasoning modes, including real-time turbo-reasoning.

Abstract

EngGPT2-16B-A3B is the latest iteration of Engineering Group's Italian LLM and it's built to be a Sovereign, Efficient and Open model. EngGPT2 is trained on 2.5 trillion tokens - less than Qwen3's 36T or Llama3's 15T - and delivers performance on key benchmarks, including MMLU-Pro, GSM8K, IFEval and HumanEval, comparable to dense models in the 8B-16B range, while requiring one-fifth to half of the inference power, and between one-tenth to one-sixth of the training data and consequent needed training power. Designed as a trained-from-scratch Mixture-of-Experts (MoE) architecture, EngGPT2 features 16 billion parameters with 3 billion active per inference, with expert sizes positioned between those used in GPT-OSS and Qwen3. Approximately 25% of its training corpus consists of Italian-language data, to deliver strong capabilities for European and Italian NLP tasks among models of similar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
engineering-group/EngGPT2-16B-A3B
model· 2.6k dl· ♡ 18
2.6k dl♡ 18

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.