When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling

Yuanhang Li

arXiv:2604.03562·cs.AI·April 7, 2026

When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling

Yuanhang Li

PDF

TL;DR

This paper investigates the effects of adaptive reward design in deep reinforcement learning for satellite scheduling, revealing a stability dilemma and introducing causal probing to understand reward influence.

Contribution

It uncovers the switching-stability dilemma in reward adaptation and introduces a causal probing method to analyze reward term impacts in LLM-guided DRL.

Findings

01

Near-constant reward weights outperform dynamic ones due to PPO convergence issues.

02

Probing reveals a +20% increase in switching penalty significantly improves performance.

03

MLP-based models outperform LLM fine-tuning in known and novel regimes.

Abstract

Adaptive reward design for deep reinforcement learning (DRL) in multi-beam LEO satellite scheduling is motivated by the intuition that regime-aware reward weights should outperform static ones. We systematically test this intuition and uncover a switching-stability dilemma: near-constant reward weights (342.1 Mbps) outperform carefully-tuned dynamic weights (103.3+/-96.8 Mbps) because PPO requires a quasistationary reward signal for value function convergence. Weight adaptation-regardless of quality-degrades performance by repeatedly restarting convergence. To understand why specific weights matter, we introduce a single-variable causal probing method that independently perturbs each reward term by +/-20% and measures PPO response after 50k steps. Probing reveals counterintuitive leverage: a +20% increase in the switching penalty yields +157 Mbps for polar handover and +130 Mbps for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.