Post Hoc Extraction of Pareto Fronts for Continuous Control
Raghav Thakar, Gaurav Dixit, Kagan Tumer

TL;DR
This paper introduces MAPEX, a method that extracts Pareto fronts for continuous control by reusing pre-trained policies, enabling efficient multi-objective trade-off analysis without retraining from scratch.
Contribution
MAPEX is a novel offline MORL approach that leverages pre-trained specialists to efficiently construct Pareto fronts, reducing sample costs significantly.
Findings
MAPEX produces comparable Pareto fronts with minimal sample cost.
MAPEX effectively reuses pre-trained policies and critics.
MAPEX outperforms existing methods in sample efficiency.
Abstract
Agents in the real world must often balance multiple objectives, such as speed, stability, and energy efficiency in continuous control. To account for changing conditions and preferences, an agent must ideally learn a Pareto frontier of policies representing multiple optimal trade-offs. Recent advances in multi-policy multi-objective reinforcement learning (MORL) enable learning a Pareto front directly, but require full multi-objective consideration from the start of training. In practice, multi-objective preferences often arise after a policy has already been trained on a single specialised objective. Existing MORL methods cannot leverage these pre-trained `specialists' to learn Pareto fronts and avoid incurring the sample costs of retraining. We introduce Mixed Advantage Pareto Extraction (MAPEX), an offline MORL method that constructs a frontier of policies by reusing pre-trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Explainable Artificial Intelligence (XAI)
