Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

Nils Strassenburg; Boris Glavic; Tilmann Rabl

arXiv:2512.05525·cs.DB·May 5, 2026

Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

Nils Strassenburg, Boris Glavic, Tilmann Rabl

PDF

TL;DR

This paper introduces JITR, a method for replacing large language models with smaller, task-specific models to reduce resource use while maintaining performance, demonstrated through the Poodle prototype.

Contribution

The paper proposes a novel approach for seamless, just-in-time replacement of LLMs with efficient models, addressing resource and energy costs for simple tasks.

Findings

01

JITR achieves significant cost savings on example tasks.

02

The Poodle prototype demonstrates effective model replacement.

03

Model search and transfer learning are key to JITR's success.

Abstract

Businesses increasingly rely on large language models (LLMs) to automate simple repetitive tasks instead of developing custom machine learning models. LLMs require few, if any, training examples and can be utilized by users without expertise in model development. However, this comes at the cost of substantially higher resource and energy consumption compared to smaller models, which often achieve similar predictive performance for simple tasks. In this paper, we present our vision for just-in-time model replacement (JITR), where, upon identifying a recurring task in calls to an LLM, the model is replaced transparently with a cheaper alternative that performs well for this specific task. JITR retains the ease of use and low development effort of LLMs, while saving significant cost and energy. We discuss the main challenges in realizing our vision regarding the identification of recurring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.