ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining

Seonwu Kim; Yohan Na; Kihun Kim; Hanhee Cho; Geun Lim; Mintae Kim; Seongik Park; Ki Hyun Kim; Youngsub Han; Byoung-Ki Jeon

arXiv:2507.06795·cs.CL·October 24, 2025

ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining

Seonwu Kim, Yohan Na, Kihun Kim, Hanhee Cho, Geun Lim, Mintae Kim, Seongik Park, Ki Hyun Kim, Youngsub Han, Byoung-Ki Jeon

PDF

1 Video

TL;DR

This paper introduces ixi-GEN, a domain adaptive continual pretraining approach for small LLMs, significantly improving their domain-specific performance while maintaining general capabilities, thus enabling cost-effective enterprise deployment.

Contribution

The paper presents a novel DACP-based method for enhancing small LLMs across various domains, demonstrating its effectiveness through extensive experiments and real-world evaluations.

Findings

01

DACP improves sLLMs' domain-specific performance

02

ixi-GEN models retain general capabilities

03

Cost-efficient and scalable for enterprise deployment

Abstract

The emergence of open-source large language models (LLMs) has expanded opportunities for enterprise applications; however, many organizations still lack the infrastructure to deploy and maintain large-scale models. As a result, small LLMs (sLLMs) have become a practical alternative despite inherent performance limitations. While Domain Adaptive Continual Pretraining (DACP) has been explored for domain adaptation, its utility in commercial settings remains under-examined. In this study, we validate the effectiveness of a DACP-based recipe across diverse foundation models and service domains, producing DACP-applied sLLMs (ixi-GEN). Through extensive experiments and real-world evaluations, we demonstrate that ixi-GEN models achieve substantial gains in target-domain performance while preserving general capabilities, offering a cost-efficient and scalable solution for enterprise-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining· underline