Performance Trade-offs of Optimizing Small Language Models for E-Commerce

Josip Tomo Licardo; Nikola Tankovic

arXiv:2510.21970·cs.AI·October 28, 2025

Performance Trade-offs of Optimizing Small Language Models for E-Commerce

Josip Tomo Licardo, Nikola Tankovic

PDF

TL;DR

This paper demonstrates that small, optimized open-weight language models can achieve near state-of-the-art accuracy in e-commerce tasks while significantly reducing computational costs and latency, making them practical for domain-specific applications.

Contribution

It introduces a methodology for fine-tuning and optimizing a small multilingual Llama 3.2 model for e-commerce intent recognition, matching larger models' performance with lower resource requirements.

Findings

01

Small models can match large models' accuracy in e-commerce tasks.

02

Quantization techniques significantly reduce memory usage.

03

Hardware-dependent trade-offs affect inference speed and efficiency.

Abstract

Large Language Models (LLMs) offer state-of-the-art performance in natural language understanding and generation tasks. However, the deployment of leading commercial models for specialized tasks, such as e-commerce, is often hindered by high computational costs, latency, and operational expenses. This paper investigates the viability of smaller, open-weight models as a resource-efficient alternative. We present a methodology for optimizing a one-billion-parameter Llama 3.2 model for multilingual e-commerce intent recognition. The model was fine-tuned using Quantized Low-Rank Adaptation (QLoRA) on a synthetically generated dataset designed to mimic real-world user queries. Subsequently, we applied post-training quantization techniques, creating GPU-optimized (GPTQ) and CPU-optimized (GGUF) versions. Our results demonstrate that the specialized 1B model achieves 99% accuracy, matching the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.