DEPO: Dual-Efficiency Preference Optimization for LLM Agents

Sirui Chen; Mengshi Zhao; Lei Xu; Yuying Zhao; Beier Zhu; Hanwang Zhang; Shengjie Zhao; Chaochao Lu

arXiv:2511.15392·cs.CL·November 20, 2025

DEPO: Dual-Efficiency Preference Optimization for LLM Agents

Sirui Chen, Mengshi Zhao, Lei Xu, Yuying Zhao, Beier Zhu, Hanwang Zhang, Shengjie Zhao, Chaochao Lu

PDF

Open Access 1 Datasets 1 Video

TL;DR

DEPO introduces a dual-efficiency framework for LLM agents, optimizing both token usage and step count to improve interaction efficiency without sacrificing performance.

Contribution

The paper proposes a novel dual-efficiency preference optimization method for LLM agents, balancing response succinctness and action steps to enhance efficiency.

Findings

01

Token usage reduced by up to 60.9%

02

Steps decreased by up to 26.9%

03

Performance improved by up to 29.3%

Abstract

Recent advances in large language models (LLMs) have greatly improved their reasoning and decision-making abilities when deployed as agents. Richer reasoning, however, often comes at the cost of longer chain of thought (CoT), hampering interaction efficiency in real-world scenarios. Nevertheless, there still lacks systematic definition of LLM agent efficiency, hindering targeted improvements. To this end, we introduce dual-efficiency, comprising (i) step-level efficiency, which minimizes tokens per step, and (ii) trajectory-level efficiency, which minimizes the number of steps to complete a task. Building on this definition, we propose DEPO, a dual-efficiency preference optimization method that jointly rewards succinct responses and fewer action steps. Experiments on WebShop and BabyAI show that DEPO cuts token usage by up to 60.9% and steps by up to 26.9%, while achieving up to a 29.3%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

OpenCausaLab/DEPO
dataset· 16 dl
16 dl

Videos

DEPO: Dual-Efficiency Preference Optimization for LLM Agents· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)