TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation

Alfredo Garrach\'on Ruiz; Tom\'as de la Rosa; Daniel Borrajo

arXiv:2412.07682·cs.CL·November 25, 2025

TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation

Alfredo Garrach\'on Ruiz, Tom\'as de la Rosa, Daniel Borrajo

PDF

Open Access

TL;DR

TRIM reduces inference costs in large language models by omitting redundant words during generation and reconstructing the full answer with a smaller model, maintaining accuracy while saving tokens.

Contribution

The paper introduces TRIM, a novel pipeline that combines token reduction with inference modeling to improve efficiency in language generation tasks.

Findings

01

19.4% token savings on average with GPT-4o

02

Tiny decrease in evaluation metrics

03

Effective balance of efficiency and accuracy

Abstract

The high inference cost of Large Language Models (LLMs) poses challenges, especially for tasks requiring lengthy outputs. However, natural language often contains redundancy, which presents an opportunity for optimization. We have observed that LLMs can generate distilled language (i.e., concise outputs that retain essential meaning) when prompted appropriately. We propose TRIM, a pipeline for saving computational cost in which the LLM omits a predefined set of semantically irrelevant and easily inferable words based on the context during inference. Then, a specifically trained smaller language model with lower inference cost reconstructs the distilled answer into the ideal answer. Our experiments show promising results, particularly on the proposed NaLDA evaluation dataset focused on the reconstruction task, with 19.4% saved tokens on average for GPT-4o and only a tiny decrease in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling