Online Structured Prediction via Coactive Learning
Pannaga Shivaswamy, Thorsten Joachims

TL;DR
This paper introduces Coactive Learning, a framework where systems learn from user feedback, such as clicks, to improve predictions like search rankings, achieving low regret without explicit utility scores.
Contribution
It presents a novel Coactive Learning model that infers utility from observable user behavior and develops algorithms with provable regret bounds for structured prediction tasks.
Findings
Algorithms achieve ${\cal O}(\frac{1}{\sqrt{T}})$ regret
Effective in web-search ranking tasks
Applicable to movie recommendation systems
Abstract
We propose Coactive Learning as a model of interaction between a learning system and a human user, where both have the common goal of providing results of maximum utility to the user. At each step, the system (e.g. search engine) receives a context (e.g. query) and predicts an object (e.g. ranking). The user responds by correcting the system if necessary, providing a slightly improved -- but not necessarily optimal -- object as feedback. We argue that such feedback can often be inferred from observable user behavior, for example, from clicks in web-search. Evaluating predictions by their cardinal utility to the user, we propose efficient learning algorithms that have average regret, even though the learning algorithm never observes cardinal utility values as in conventional online learning. We demonstrate the applicability of our model and learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Data Stream Mining Techniques
