A Dantzig Selector Approach to Temporal Difference Learning

Matthieu Geist (Supelec); Bruno Scherrer (INRIA Nancy); Alessandro; Lazaric (INRIA Lille); Mohammad Ghavamzadeh (INRIA Lille)

arXiv:1206.6480·cs.LG·July 3, 2012·ICML·24 cites

A Dantzig Selector Approach to Temporal Difference Learning

Matthieu Geist (Supelec), Bruno Scherrer (INRIA Nancy), Alessandro, Lazaric (INRIA Lille), Mohammad Ghavamzadeh (INRIA Lille)

PDF

Open Access

TL;DR

This paper introduces a novel algorithm combining LSTD with the Dantzig Selector to improve high-dimensional value function approximation in reinforcement learning, addressing limitations of existing regularization methods.

Contribution

It presents a new integration of LSTD with the Dantzig Selector, offering an alternative regularization approach for high-dimensional TD learning.

Findings

01

Demonstrates improved feature selection in high-dimensional settings

02

Shows the proposed method addresses drawbacks of L1-regularization approaches

03

Provides theoretical analysis of the algorithm's performance

Abstract

LSTD is a popular algorithm for value function approximation. Whenever the number of features is larger than the number of samples, it must be paired with some form of regularization. In particular, L1-regularization methods tend to perform feature selection by promoting sparsity, and thus, are well-suited for high-dimensional problems. However, since LSTD is not a simple regression algorithm, but it solves a fixed--point problem, its integration with L1-regularization is not straightforward and might come with some drawbacks (e.g., the P-matrix assumption for LASSO-TD). In this paper, we introduce a novel algorithm obtained by integrating LSTD with the Dantzig Selector. We investigate the performance of the proposed algorithm and its relationship with the existing regularized approaches, and show how it addresses some of their drawbacks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCancer-related molecular mechanisms research · Music and Audio Processing · Speech and Audio Processing