Loading paper
LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization | Tomesphere