LiLiuM: eBay's Large Language Models for e-commerce
Christian Herold, Michael Kozielski, Leonid Ekimov, Pavel, Petrushkov, Pierre-Yves Vandenbussche, Shahram Khadivi

TL;DR
eBay developed the LiLiuM series of large language models tailored for e-commerce, achieving competitive performance on general benchmarks and superior results on e-commerce and multilingual tasks, with full control over data and architecture.
Contribution
The paper presents eBay's in-house developed LiLiuM LLMs, optimized for e-commerce, including custom training data, tokenizer, and architecture, outperforming external models on specific tasks.
Findings
LiLiuM models perform on par with LLaMA-2 on English NLU benchmarks.
LiLiuM outperforms LLaMA-2 on non-English NLU and e-commerce tasks.
Checkpoint averaging improves model performance.
Abstract
We introduce the LiLiuM series of large language models (LLMs): 1B, 7B, and 13B parameter models developed 100% in-house to fit eBay's specific needs in the e-commerce domain. This gives eBay full control over all aspects of the models including license, data, vocabulary, and architecture. We expect these models to be used as a foundation for fine-tuning and instruction-tuning, eliminating dependencies to external models. The LiLiuM LLMs have been trained on 3 trillion tokens of multilingual text from general and e-commerce domain. They perform similar to the popular LLaMA-2 models on English natural language understanding (NLU) benchmarks. At the same time, we outperform LLaMA-2 on non-English NLU tasks, machine translation and on e-commerce specific downstream tasks. As part of our data mixture, we utilize the newly released RedPajama-V2 dataset for training and share our insights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCustomer churn and segmentation · Semantic Web and Ontologies
