PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Preferred Elements: Kenshin Abe; Kaizaburo Chubachi; Yasuhiro Fujita,; Yuta Hirokawa; Kentaro Imajo; Toshiki Kataoka; Hiroyoshi Komatsu; Hiroaki; Mikami; Tsuguo Mogami; Shogo Murai; Kosuke Nakago; Daisuke Nishino; Toru; Ogawa; Daisuke Okanohara; Yoshihiko Ozaki; Shotaro Sano; Shuji Suzuki; Tianqi; Xu; Toshihiko Yanase

arXiv:2410.07563·cs.CL·October 23, 2024

PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Preferred Elements: Kenshin Abe, Kaizaburo Chubachi, Yasuhiro Fujita,, Yuta Hirokawa, Kentaro Imajo, Toshiki Kataoka, Hiroyoshi Komatsu, Hiroaki, Mikami, Tsuguo Mogami, Shogo Murai, Kosuke Nakago, Daisuke Nishino, Toru, Ogawa, Daisuke Okanohara, Yoshihiko Ozaki, Shotaro Sano

PDF

Open Access 1 Models

TL;DR

PLaMo-100B is a large-scale Japanese language model trained on 2 trillion tokens, utilizing novel normalization and loss techniques, and refined through fine-tuning to excel in Japanese tasks, achieving competitive results with top models.

Contribution

The paper introduces PLaMo-100B, a Japanese language model built from scratch with innovative training techniques and fine-tuning methods for improved Japanese language proficiency.

Findings

01

Achieved competitive performance on Japanese-specific benchmarks.

02

Utilized novel QK Normalization and Z-Loss for training stability.

03

Model is publicly available for research and development.

Abstract

We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performance. Benchmark evaluations suggest that PLaMo-100B performs well, particularly in Japanese-specific tasks, achieving results that are competitive with frontier models like GPT-4. The base model is available at https://huggingface.co/pfnet/plamo-100b.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
pfnet/plamo-100b
model· 138 dl· ♡ 18
138 dl♡ 18

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsBalanced Selection · Dense Connections · Residual Connection · Position-Wise Feed-Forward Layer · Adam · Attention Is All You Need · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding