Action abstractions for amortized sampling
Oussama Boussif, L\'ena N\'ehale Ezzine, Joseph D Viviano, Micha{\l}, Koziarski, Moksh Jain, Nikolay Malkin, Emmanuel Bengio, Rim Assouel, Yoshua, Bengio

TL;DR
This paper introduces a method for incorporating high-level action abstractions into reinforcement learning policies, improving exploration and sample efficiency in complex environments by chunking common action subsequences.
Contribution
It presents a novel iterative approach to extract and incorporate action abstractions into RL, enabling hierarchical planning and better exploration in long-horizon tasks.
Findings
Enhanced sample efficiency in discovering high-reward objects.
Abstracted actions are interpretable and reveal latent reward structure.
Improved performance on challenging exploration problems.
Abstract
As trajectories sampled by policies used by reinforcement learning (RL) and generative flow networks (GFlowNets) grow longer, credit assignment and exploration become more challenging, and the long planning horizon hinders mode discovery and generalization. The challenge is particularly pronounced in entropy-seeking RL methods, such as generative flow networks, where the agent must learn to sample from a structured distribution and discover multiple high-reward states, each of which take many steps to reach. To tackle this challenge, we propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process. Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and `chunking' them into a single action that is added to the action space. In empirical evaluation on…
Peer Reviews
Decision·ICLR 2025 Poster
- The proposed ACTIONPIECE compatible with both RL and GFlowNets sampler. - Empirical evaluation showing improved sample efficiency and mode discovery in different environments.
- As mentioned in the related works section, the discovery of macro-actions has been extensively studied. The authors should provide a more detailed discussion highlighting how the proposed method differs from existing methods. - In Line 159, the paper appears to assume a deterministic state transition, where $s'=s+a$. This assumption may be too strong and not applicable to real-world environments where state transitions involve a degree of randomness. The reviewers are concerned about the gene
- The idea of expanding the action space dynamically online with temporally extended action sequence is interesting and novel. - The experiments are comprehensive and thorough with insightful analyses and visualizations that demonstrate the effectiveness of the proposed algorithm.
*Unfounded claim* - “the abstracted high-level actions are interpretable, …” — there is no evidence presented in the paper that illustrates the high-level actions are interpretable. *Comparison to prior chunking mechanisms is limited* - The authors considered two new chunking mechanisms, "ActionPiece-Increment" and "ActionPiece-Replace". Both of them use heuristics to expand action space with temporally extended action sequences. - It is unclear how these mechanisms compare to prior chunking
Overall, this paper is well presented and provides an articulate method. I particularly appreciate the environment selections of a real-world orientation and informative way of discussion.
(W1) Minor typos, e.g.,’the the’ at line 319. (W2) It seems that all experiments are averaged from only three seeds per line 348 and 507, which is not enough to demonstrate statistical significance in some settings.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis
