Sayer: Using Implicit Feedback to Optimize System Policies
Mathias L\'ecuyer, Sang Hoon Kim, Mihir Nanavati, Junchen Jiang,, Siddhartha Sen, Amit Sharma, Aleksandrs Slivkins

TL;DR
Sayer is a methodology that uses implicit feedback and reinforcement learning techniques to evaluate and optimize system policies without deployment, improving decision-making in resource management.
Contribution
It introduces a novel approach combining implicit exploration and counterfactual estimators to leverage implicit feedback for policy evaluation and training.
Findings
Accurately evaluates policies in Azure scenarios
Outperforms existing production policies
Demonstrates unbiased policy assessment
Abstract
We observe that many system policies that make threshold decisions involving a resource (e.g., time, memory, cores) naturally reveal additional, or implicit feedback. For example, if a system waits X min for an event to occur, then it automatically learns what would have happened if it waited <X min, because time has a cumulative property. This feedback tells us about alternative decisions, and can be used to improve the system policy. However, leveraging implicit feedback is difficult because it tends to be one-sided or incomplete, and may depend on the outcome of the event. As a result, existing practices for using feedback, such as simply incorporating it into a data-driven model, suffer from bias. We develop a methodology, called Sayer, that leverages implicit feedback to evaluate and train new system policies. Sayer builds on two ideas from reinforcement learning -- randomized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Data Classification
