Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization

Takato Yasuno

arXiv:2603.18037·cs.LG·March 20, 2026

Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization

Takato Yasuno

PDF

Open Access

TL;DR

This paper develops a systematic approach for creating efficient Japanese small language models tailored to specific domains, optimizing training scale, model selection, and quantization for deployment on consumer hardware.

Contribution

It introduces a comprehensive methodology for building domain-specific Japanese small LMs using QLoRA fine-tuning, including optimal training scale, model comparison, and architecture-aware quantization.

Findings

01

Optimal training scale identified at 4,000 samples.

02

Japanese continual pre-trained Llama-3 models outperform multilingual models.

03

Q4_K_M quantization improves Llama-3 architectures but degrades GQA architectures.

Abstract

This paper presents a systematic methodology for building domain-specific Japanese small language models using QLoRA fine-tuning. We address three core questions: optimal training scale, base-model selection, and architecture-aware quantization. Stage 1 (Training scale): Scale-learning experiments (1k--5k samples) identify n=4,000 as optimal, where test-set NLL reaches minimum (1.127) before overfitting at 5k samples. Stage 2 (Compare finetuned SLMs): Comparing four Japanese LLMs shows that Llama-3 models with Japanese continual pre-training (Swallow-8B, ELYZA-JP-8B) outperform multilingual models (Qwen2.5-7B). Stage 3 (Quantization): Llama-3 architectures improve under Q4_K_M quantization, while GQA architectures degrade severely (Qwen2.5: -0.280 points). Production recommendation: Swallow-8B Q4_K_M achieves 2.830/3 score, 8.9 s/question, 4.9 GB size. The methodology generalizes to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis