# LazyAct: Lazy actor with dynamic state skip based on constrained MDP

**Authors:** Hongjie Zhang, Zhenyu Chen, Hourui Deng, Chaosheng Feng, Manoharan Premkumar, Manoharan Premkumar, Manoharan Premkumar

PMC · DOI: 10.1371/journal.pone.0318778 · PLOS ONE · 2025-02-06

## TL;DR

LazyAct is a reinforcement learning algorithm that reduces computational costs by skipping non-critical states during decision-making.

## Contribution

Introduces LazyAct, a novel algorithm that dynamically skips states in actor networks to reduce inference costs with minimal performance loss.

## Key findings

- LazyAct reduces inferences by 80% in single-agent scenarios and 40% in multi-agent scenarios.
- The algorithm maintains comparable policy performance while significantly lowering time and FLOPs.
- Pre-training and fine-tuning techniques effectively train the policy network with cost constraints.

## Abstract

Deep reinforcement learning has achieved significant success in complex decision-making tasks. However, the high computational cost of policies based on deep neural networks restricts their practical application. Specifically, each decision made by an agent requires a complete neural network computation, leading to a linear increase in computational cost with the number of interactions and agents. Inspired by human decision-making patterns, which involve reasoning only on critical states in continuous decision-making tasks without considering all states, we introduce the LazyAct algorithm. This algorithm significantly reduces the number of inferences while preserving the quality of the policy. Firstly, we incorporate a state skipping branch into the actor network to bypass states with minimal impact. Subsequently, we establish optimization objectives for single-agent and multi-agents inference, incorporating cost constraints based on the IMPALA and MAPPO frameworks, respectively. Finally, we utilize pre-training and fine-tuning techniques to train the policy network. Extensive experimental results indicate that LazyAct reduces the number of inferences by approximately 80% and 40% in single-agent and multi-agents scenarios, respectively, while sustaining comparable policy performance. The inferences reduction significantly decreases the time and FLOPs required by the LazyAct algorithm to complete tasks. Code is available here https://www.dropbox.com/scl/fo/wyoqo6q9gyt86zobfgbvx/h?\rlkey=0moyxsnoiisfs9y4h89hsou1l&dl=0.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11801576/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11801576/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/PMC11801576/full.md

---
Source: https://tomesphere.com/paper/PMC11801576