Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization

Jian-Ting Guo; Yu-Cheng Chen; Ping-Chun Hsieh; Kuo-Hao Ho; Po-Wei Huang; Ti-Rong Wu; I-Chen Wu

arXiv:2511.15055·cs.AI·November 20, 2025

Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization

Jian-Ting Guo, Yu-Cheng Chen, Ping-Chun Hsieh, Kuo-Hao Ho, Po-Wei Huang, Ti-Rong Wu, I-Chen Wu

PDF

Open Access 1 Video

TL;DR

This paper introduces Macro Action Quantization (MAQ), a novel framework that enhances the human-likeness of reinforcement learning agents by aligning their trajectories with human behavior through trajectory optimization and macro actions.

Contribution

The paper proposes MAQ, a new method that distills human demonstrations into macro actions to produce more human-like RL agents, improving interpretability and trustworthiness.

Findings

01

MAQ significantly increases trajectory similarity scores.

02

MAQ achieves the highest human-likeness rankings in evaluations.

03

MAQ can be integrated into various RL algorithms.

Abstract

Human-like agents have long been one of the goals in pursuing artificial intelligence. Although reinforcement learning (RL) has achieved superhuman performance in many domains, relatively little attention has been focused on designing human-like RL agents. As a result, many reward-driven RL agents often exhibit unnatural behaviors compared to humans, raising concerns for both interpretability and trustworthiness. To achieve human-like behavior in RL, this paper first formulates human-likeness as trajectory optimization, where the objective is to find an action sequence that closely aligns with human behavior while also maximizing rewards, and adapts the classic receding-horizon control to human-like learning as a tractable and efficient implementation. To achieve this, we introduce Macro Action Quantization (MAQ), a human-like RL framework that distills human demonstrations into macro…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition · Multimodal Machine Learning Applications