LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

Yang Zhao; Zihao Li; Zhiyu Jiang; Dandan Ma; Ganchao Liu; Wenzhe Zhao

arXiv:2603.02680·cs.AI·March 4, 2026

LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

Yang Zhao, Zihao Li, Zhiyu Jiang, Dandan Ma, Ganchao Liu, Wenzhe Zhao

PDF

Open Access

TL;DR

This paper introduces NAR-CP, a novel method for high-frequency decision-making with LLMs, using reward normalization and consistency loss to improve policy alignment and performance in UAV pursuit tasks.

Contribution

The paper proposes NAR-CP, combining reward normalization and consistency loss, to enhance LLMs' performance in high-frequency decision tasks, addressing policy misalignment issues.

Findings

01

Superior performance on UAV pursuit tasks

02

Effective generalization to unseen tasks

03

Improved policy alignment in composite tasks

Abstract

While Large Language Models (LLMs) form the cornerstone of sequential decision-making agent development, they have inherent limitations in high-frequency decision tasks. Existing research mainly focuses on discrete embodied decision scenarios with low-frequency and significant semantic differences in state space (e.g., household planning). These methods suffer from limited performance in high-frequency decision-making tasks, since high-precision numerical state information in such tasks undergoes frequent updates with minimal fluctuations, and exhibiting policy misalignment between the learned sub-tasks and composite tasks. To address these issues, this paper proposes Normalized Action Reward guided Consistency Policy Optimization (NAR-CP). 1) Our method first acquires predefined dense rewards from environmental feedback of candidate actions via reward functions, then completes reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning