ReflexGrad: A Dual-Process Architecture for Gradient-Free Inference-Time Learning
Ankush Kadu, Ashwanth Krishnan

TL;DR
ReflexGrad introduces a gradient-free, inference-time learning framework that enables large language models to adapt dynamically during execution without retraining or weight updates, improving performance on diverse tasks.
Contribution
The paper presents ReflexGrad, a novel dual-process architecture that allows models to perform real-time adaptation through textual feedback and causal diagnosis without gradient updates.
Findings
Achieves strong zero-shot performance across various tasks.
Eliminates need for multiple trials or retraining during inference.
Demonstrates practical viability of gradient-free inference-time learning.
Abstract
Scaling inference-time compute has emerged as a powerful paradigm--yet deliberating longer is not the same as learning. Current approaches to extended reasoning in large language models allocate more computation to thinking but remain fundamentally static: they cannot adapt from mistakes encountered during execution. Online reinforcement learning offers adaptation but requires gradient updates at runtime--expensive, prone to catastrophic forgetting, and unstable in deployment. We introduce ReflexGrad, a gradient-free framework for genuine inference-time learning: adaptation without retraining, without weight updates, without demonstrations. Our key insight is that effective runtime learning requires two complementary mechanisms--rapid policy refinement during forward progress, and deliberate causal diagnosis when stuck--with intelligent routing between them. ReflexGrad implements this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
