Learning CLI Agents with Structured Action Credit under Selective Observation
Haoyang Su, Ying Wen

TL;DR
This paper introduces novel methods for training CLI agents by leveraging structured action attributes and selective observations, improving task learning in complex environments.
Contribution
It proposes $\sigma$-Reveal for context selection and $ ext{A}^3$ for credit assignment, addressing key bottlenecks in CLI agent learning.
Findings
$\sigma$-Reveal enhances context relevance in CLI tasks.
$ ext{A}^3$ improves credit assignment in multi-turn trajectories.
ShellOps dataset enables benchmarking of CLI agent performance.
Abstract
Command line interface (CLI) agents are emerging as a practical paradigm for agent-computer interaction over evolving filesystems, executable command line programs, and online execution feedback. Recent work has used reinforcement learning (RL) to learn these interaction abilities from verifiable task feedback, yet few methods exploit the native structured attributes of CLI actions as learning signals. Beyond this underused action structure, CLI learning also couples two bottlenecks for coding agents. First, the agent must identify task-relevant evidence in a large codebase from partial observations. Second, sparse terminal rewards must be assigned to the actions that shape a long multi-turn trajectory. We study these bottlenecks through shell-driven information extraction and file editing tasks. For selective observation, we introduce -Reveal, an inference-time mechanism that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
