Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization

Aadyot Bhatnagar; Peter M{\o}rch Groth; Ali Madani

arXiv:2604.13175·cs.LG·April 17, 2026

Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization

Aadyot Bhatnagar, Peter M{\o}rch Groth, Ali Madani

PDF

TL;DR

This paper introduces STOMP, a novel offline reinforcement learning algorithm that uses smooth Tchebysheff scalarization to effectively optimize multiple conflicting objectives, demonstrated on protein engineering tasks.

Contribution

The paper develops STOMP, a new multi-objective offline RL method that overcomes linear scalarization limitations using smooth Tchebysheff scalarization, with empirical validation on protein datasets.

Findings

01

STOMP achieves the highest hypervolumes in 8 of 9 settings.

02

It outperforms state-of-the-art baselines in multi-objective protein optimization.

03

STOMP is robust and improves post-trained models for multi-attribute tasks.

Abstract

Large language models can be aligned with human preferences through offline reinforcement learning (RL) on small labeled datasets. While single-objective alignment is well-studied, many real-world applications demand the simultaneous optimization of multiple conflicting rewards, e.g. optimizing both catalytic activity and specificity in protein engineering, or helpfulness and harmlessness for chatbots. Prior work has largely relied on linear reward scalarization, but this approach provably fails to recover non-convex regions of the Pareto front. In this paper, instead of scalarizing the rewards directly, we frame multi-objective RL itself as an optimization problem to be scalarized via smooth Tchebysheff scalarization, a recent technique that overcomes the shortcomings of linear scalarization. We use this formulation to derive Smooth Tchebysheff Optimization of Multi-Objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.