FM-EAC: Feature Model-based Enhanced Actor-Critic for Multi-Task Control in Dynamic Environments
Quanxi Zhou, Wencan Mao, Manabu Tsukada, John C.S. Lui, and Yusheng Ji

TL;DR
FM-EAC is a novel reinforcement learning algorithm that combines model-based and model-free approaches with feature-based models to improve multi-task control and transferability in dynamic environments, demonstrated through urban and agricultural simulations.
Contribution
The paper introduces FM-EAC, a generalized algorithm integrating planning, acting, and learning with feature models and an enhanced actor-critic for better multi-task transferability.
Findings
FM-EAC outperforms state-of-the-art methods in simulations.
It offers customizable sub-networks for user-specific needs.
Demonstrates effectiveness in urban and agricultural scenarios.
Abstract
Model-based reinforcement learning (MBRL) and model-free reinforcement learning (MFRL) evolve along distinct paths but converge in the design of Dyna-Q [1]. However, modern RL methods still struggle with effective transferability across tasks and scenarios. Motivated by this limitation, we propose a generalized algorithm, Feature Model-Based Enhanced Actor-Critic (FM-EAC), that integrates planning, acting, and learning for multi-task control in dynamic environments. FM-EAC combines the strengths of MBRL and MFRL and improves generalizability through the use of novel feature-based models and an enhanced actor-critic framework. Simulations in both urban and agricultural applications demonstrate that FM-EAC consistently outperforms many state-of-the-art MBRL and MFRL methods. More importantly, different sub-networks can be customized within FM-EAC according to user-specific requirements.
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. **Interesting application domain** - the focus on UAV-based multi-task control is practically relevant and demonstrates potential impact for real-world RL deployment. 2. **Broad contextualization** - the paper cites a wide range of MFRL and MBRL baselines
1. **Lack of novelty and conceptual depth** - despite the title, FM-EAC is effectively a standard actor-critic (close to SAC/TD3) with auxiliary feature-extraction modules. There is no genuine model-based component (no learned dynamics model, planning step, or synthetic rollout), contradicting the model-based claim. The "enhanced" actor-critic merely duplicates critics for primary/secondary tasks, which is a known trick from multi-objective and multi-head architectures. 2. **Pedagogical rather t
This article is written smoothly and summarizes the work done in concise language. Meanwhile, FM-EAC has excellent generalization ability and can be applied in multiple engineering tasks, making significant progress. Additionally, Figure 1 in this article is very beautiful and impressive.
Lack of display of experimental results. The author applied FM-EAC to unmanned aerial vehicle control tasks and several other tasks in the fields of industrial and agricultural engineering. However, these experiments lack specific implementation details and do not provide image or video displays to aid understanding. This confuses readers and raises doubts about the authenticity of the article.
1. The paper addresses a practically relevant problem of multi-task UAV control in dynamic environments with real-world applications. 2. The modular feature model design allowing customizable sub-networks (GNN, PAN, BPN) for different scenarios is a useful architectural choice. 3. The paper provides detailed mathematical formulations and algorithms for the enhanced actor-critic framework.
1. Technical Contributions and Novelty. The enhanced actor-critic is a straightforward extension using separate critics for different rewards, which lacks novelty. The paper provides no theoretical justification (convergence, sample complexity) for why feature models improve generalization, and fails to compare against the most relevant meta-RL methods. 2. Experimental Evaluation. Only one MBRL baseline (MBPO) and no meta-RL methods are compared, while the train/test setup is vague ("3-5 out of
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning
