CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation
Santhosh Kumar Ravindran

TL;DR
CosmoCore is a neuroscience-inspired reinforcement learning architecture that uses affective signals to improve code generation by prioritizing error correction, reducing hallucinations, and accelerating learning in large language models.
Contribution
It introduces a novel affective RL framework that incorporates valence and surprise signals to enhance code generation, extending beyond traditional human feedback methods.
Findings
Reduces hallucinated code by 48%
Accelerates self-correction by 45%
Validates effectiveness on multiple benchmarks
Abstract
We introduce CosmoCore, a neuroscience-inspired reinforcement learning (RL) architecture that integrates affective signals to enhance code generation in large language models (LLMs). Motivated by human and animal learning where embarrassment from mistakes drives rapid correction, as observed in training a puppy to avoid repeating errors after a single scolding CosmoCore tags code generation trajectories with valence and surprise using a lightweight multi-layer perceptron (MLP). High-negative valence (cringe) episodes, such as buggy code outputs, are prioritized in a Dream Queue for five-fold replay during off-policy updates, while low-surprise successes are pruned to prevent overconfidence and buffer bloat. Evaluated on code generation benchmarks like HumanEval and BigCodeBench, alongside simulations with a custom data pipeline environment, CosmoCore reduces hallucinated code (e.g.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Software Testing and Debugging Techniques
