Diffusion-Based Offline RL for Improved Decision-Making in Augmented ARC   Task

Yunho Kim; Jaehyun Park; Heejun Kim; Sejin Kim; Byung-Jun; Lee; Sundong Kim

arXiv:2410.11324·cs.AI·October 16, 2024

Diffusion-Based Offline RL for Improved Decision-Making in Augmented ARC Task

Yunho Kim, Jaehyun Park, Heejun Kim, Sejin Kim, Byung-Jun, Lee, Sundong Kim

PDF

Open Access

TL;DR

This paper introduces SOLAR, a new dataset and generator to enable offline RL for complex decision-making in ARC, demonstrating improved reasoning in a simple task.

Contribution

We created SOLAR and the SOLAR-Generator to provide sufficient data for offline RL in ARC, and validated their effectiveness with LDCQ on a simple task.

Findings

01

Offline RL with SOLAR improves decision-making in ARC

02

LDCQ successfully identifies correct answer states

03

Generated data enables strategic reasoning in complex environments

Abstract

Effective long-term strategies enable AI systems to navigate complex environments by making sequential decisions over extended horizons. Similarly, reinforcement learning (RL) agents optimize decisions across sequences to maximize rewards, even without immediate feedback. To verify that Latent Diffusion-Constrained Q-learning (LDCQ), a prominent diffusion-based offline RL method, demonstrates strong reasoning abilities in multi-step decision-making, we aimed to evaluate its performance on the Abstraction and Reasoning Corpus (ARC). However, applying offline RL methodologies to enhance strategic reasoning in AI for solving tasks in ARC is challenging due to the lack of sufficient experience data in the ARC training set. To address this limitation, we introduce an augmented offline RL dataset for ARC, called Synthesized Offline Learning Data for Abstraction and Reasoning (SOLAR), along…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsQ-Learning