# Solving Multi-Objective MDP with Lexicographic Preference: An   application to stochastic planning with multiple quantile objective

**Authors:** Yan Li, Zhaohan Sun

arXiv: 1705.03597 · 2017-05-11

## TL;DR

This paper introduces a novel approach for solving multi-objective Markov Decision Processes with lexicographic preferences, focusing on quantile-based evaluation to better handle risk and multiple objectives in stochastic planning.

## Contribution

It proposes a reformulation of multi-objective MDPs using quantile-based criteria and introduces the FLMDP algorithm for solving such problems with lexicographic preferences.

## Key findings

- The FLMDP algorithm effectively solves multi-objective MDPs with lexicographic preferences.
- Quantile-based evaluation provides a robust alternative to expectation-based criteria in risk-sensitive applications.
- Application to autonomous driving demonstrates practical benefits of the proposed approach.

## Abstract

In most common settings of Markov Decision Process (MDP), an agent evaluate a policy based on expectation of (discounted) sum of rewards. However in many applications this criterion might not be suitable from two perspective: first, in risk aversion situation expectation of accumulated rewards is not robust enough, this is the case when distribution of accumulated reward is heavily skewed; another issue is that many applications naturally take several objective into consideration when evaluating a policy, for instance in autonomous driving an agent needs to balance speed and safety when choosing appropriate decision. In this paper, we consider evaluating a policy based on a sequence of quantiles it induces on a set of target states, our idea is to reformulate the original problem into a multi-objective MDP problem with lexicographic preference naturally defined. For computation of finding an optimal policy, we proposed an algorithm \textbf{FLMDP} that could solve general multi-objective MDP with lexicographic reward preference.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.03597/full.md

## References

10 references — full list in the complete paper: https://tomesphere.com/paper/1705.03597/full.md

---
Source: https://tomesphere.com/paper/1705.03597