Optimizing Life Sciences Agents in Real-Time using Reinforcement Learning
Nihir Chadderwala

TL;DR
This paper introduces a reinforcement learning framework for life sciences AI agents that improves decision-making by learning from user feedback, enhancing query handling without needing labeled data.
Contribution
It presents a novel combination of AWS Strands Agents with Thompson Sampling bandits to optimize AI agent decisions in real-time based on user interactions.
Findings
Achieved 15-30% increase in user satisfaction over baselines.
Demonstrated effective learning patterns after 20-30 queries.
System adapts continuously without ground truth labels.
Abstract
Generative AI agents in life sciences face a critical challenge: determining the optimal approach for diverse queries ranging from simple factoid questions to complex mechanistic reasoning. Traditional methods rely on fixed rules or expensive labeled training data, neither of which adapts to changing conditions or user preferences. We present a novel framework that combines AWS Strands Agents with Thompson Sampling contextual bandits to enable AI agents to learn optimal decision-making strategies from user feedback alone. Our system optimizes three key dimensions: generation strategy selection (direct vs. chain-of-thought), tool selection (literature search, drug databases, etc.), and domain routing (pharmacology, molecular biology, clinical specialists). Through empirical evaluation on life science queries, we demonstrate 15-30\% improvement in user satisfaction compared to random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · AI-based Problem Solving and Planning
