Enhancing Offline Reinforcement Learning with Curriculum Learning-Based   Trajectory Valuation

Amir Abolfazli; Zekun Song; Avishek Anand; Wolfgang Nejdl

arXiv:2502.00601·cs.LG·April 15, 2025

Enhancing Offline Reinforcement Learning with Curriculum Learning-Based Trajectory Valuation

Amir Abolfazli, Zekun Song, Avishek Anand, Wolfgang Nejdl

PDF

Open Access

TL;DR

This paper introduces CLTV, a curriculum learning approach that uses transition scoring to improve offline reinforcement learning by effectively prioritizing high-quality trajectories, especially in mixed datasets with domain mismatch.

Contribution

The paper proposes CLTV, a novel trajectory valuation method using transition scores and curriculum learning to address domain mismatch in offline RL with mixed datasets.

Findings

01

CLTV improves offline RL performance across various algorithms.

02

The method enhances transferability of policies in MuJoCo environments.

03

Theoretical analysis supports the effectiveness of CLTV.

Abstract

The success of deep reinforcement learning (DRL) relies on the availability and quality of training data, often requiring extensive interactions with specific environments. In many real-world scenarios, where data collection is costly and risky, offline reinforcement learning (RL) offers a solution by utilizing data collected by domain experts and searching for a batch-constrained optimal policy. This approach is further augmented by incorporating external data sources, expanding the range and diversity of data collection possibilities. However, existing offline RL methods often struggle with challenges posed by non-matching data from these external sources. In this work, we specifically address the problem of source-target domain mismatch in scenarios involving mixed datasets, characterized by a predominance of source data generated from random or suboptimal policies and a limited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Reliability and Analysis Research · Autonomous Vehicle Technology and Safety · Human-Automation Interaction and Safety