Seeker: Real-Time Interactive Search
Ari Biswas, Thai T Pham, Michael Vogelsong, Benjamin Snyder, Houssam, Nassif

TL;DR
Seeker is a real-time interactive search system enabling users to refine search results through simple feedback, improving search relevance by leveraging human input without requiring explicit item descriptions or representations.
Contribution
The paper introduces a novel interactive search algorithm that incorporates user feedback to adapt search results in real time, without needing explicit item representations.
Findings
Effective real-time search refinement demonstrated
Quantitative and qualitative evaluation confirms improved relevance
Human-in-the-loop experiments validate approach
Abstract
This paper introduces Seeker, a system that allows users to interactively refine search rankings in real time, through feedback in the form of likes and dislikes. When searching online, users may not know how to accurately describe their product of choice in words. An alternative approach is to search an embedding space, allowing the user to query using a representation of the item (like a tune for a song, or a picture for an object). However, this approach requires the user to possess an example representation of their desired item. Additionally, most current search systems do not allow the user to dynamically adapt the results with further feedback. On the other hand, users often have a mental picture of the desired item and are able to answer ordinal questions of the form: "Is this item similar to what you have in mind?" With this assumption, our algorithm allows for users to provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Data Classification
Seeker: Real-Time Interactive Search
Ari Biswas
Amazon.comSeattleWA
,
Thai T. Pham
Amazon.comSeattleWA
,
Michael Vogelsong
Amazon.comSeattleWA
,
Benjamin Snyder
and
Houssam Nassif
Amazon.comSeattleWA
(2019)
Abstract.
This paper introduces Seeker, a system that allows users to adaptively refine search rankings in real time, through a series of feedbacks in the form of likes and dislikes. When searching online, users may not know how to accurately describe their product of choice in words. An alternative approach is to search an embedding space, allowing the user to query using a representation of the item (like a tune for a song, or a picture for an object). However, this approach requires the user to possess an example representation of their desired item. Additionally, most current search systems do not allow the user to dynamically adapt the results with further feedback. On the other hand, users often have a mental picture of the desired item and are able to answer ordinal questions of the form: “Is this item similar to what you have in mind?” With this assumption, our algorithm allows for users to provide sequential feedback on search results to adapt the search feed. We show that our proposed approach works well both qualitatively and quantitatively. Unlike most previous representation-based search systems, we can quantify the quality of our algorithm by evaluating humans-in-the-loop experiments.
Interactive Search, Real Time Recommendation, Online Learning, Active Learning, Multi-Armed Bandit
††journalyear: 2019††copyright: rightsretained††conference: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; August 4–8, 2019; Anchorage, AK, USA††booktitle: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’19), August 4–8, 2019, Anchorage, AK, USA††doi: 10.1145/3292500.3330733††isbn: 978-1-4503-6201-6/19/08††ccs: Information systems Search interfaces††ccs: Information systems Probabilistic retrieval models††ccs: Information systems Information retrieval diversity††ccs: Information systems Test collections††ccs: Information systems Relevance assessment††ccs: Computing methodologies Online learning settings††ccs: Computing methodologies Active learning settings††ccs: Computing methodologies Discrete space search††ccs: Computing methodologies Search with partial observations††ccs: Computing methodologies Sequential decision making
1. Introduction
Search engines and online shopping websites maintain indices with millions of items. Often, it is difficult for a user to accurately describe in words what they are looking for (Teo et al., 2016). Even if the user is able to describe their target item effectively, large index and catalog sizes mean it is difficult to sift through similar items efficiently.
Consider the situation in which a user is searching for a new movie to watch. They have a mental representation of the characteristics of the movie they would enjoy but are not acquainted with the genre keywords, latest movies, actors or directors. Being unfamiliar with current movie jargon, they are unable to accurately describe their preferred movie with a traditional keyword interface, nor do they have an example photograph. However, if we show the same user another movie they have seen and ask them “Is this movie similar to the one they have in mind? Yes or no?”, people can answer such ordinal questions with less noise than absolute judgments – i.e. finding the exact words to describe their choice (Stewart et al., 2005).
The above scenario is not restricted to movies only. In the case of browsing for a song on a media platform, searching for a news article on a news website, or a dress on an online platform, the user may not be able to accurately describe the desired item in a traditional keyword interface. But users could provide relative judgments based on what they have experienced before. For example, answers to queries like “Songs similar to Heroes by David Bowie: Yes or no?” or “News similar to that of the Queen’s involvement with Brexit: Yes or no?” are easier to provide.
In addition, traditional search engines (Ledford, 2015; Smith and Linden, 2017) and the newer representation search systems (described in Section 2) are temporally static. The engines use text or imagery as the query and respond with a ranked list of results. This ranking is based on an estimate of relevance to the user in their current context – location, historical searches etc. They do not provide the user the opportunity to adapt and fine-tune the resulting page with additional feedback. In traditional engines, for a given user in a given session, each query is independent of each other. Figure 1 illustrates the difference between traditional engines and our setting.
In this paper, we describe our system, Seeker, that dynamically refines search results based on real-time interactions with the user (in the form of likes and dislikes) within a single search session. From a customer perspective, this system adds the feeling of an ”in-store” shopping discovery experience, with a personal curator. In our setting, the user scrolls through a page of items and may ”like” or ”dislike” any item at any time. The data gathered from these preferences is used to update the list of results shown in real-time, thereby iteratively closing in on what they are looking for. To our knowledge, Seeker is the first interactive and dynamic search experience which enables the user to seamlessly zoom in, zoom out, and pivot by scrolling up and down and selecting items to like and dislike in an adaptive manner.
In this work we make the following contributions:
- •
Introduce Seeker, an interactive recommendation algorithm deployed at scale, which adapts to customer inputs in real time.
- •
Propose a novel evaluation metric with humans in the loop that allow us to quantify the quality of our proposed algorithm and evaluate it against other methods. Most embedding-based representational search engines in the past have evaluated their systems only qualitatively rather than quantitatively. In our experiments, we simulate the tasks of searching for a particular item, and quantifiably measure progress.
The paper is organized as follows. In Section 2, we review related papers and search engines. In Section 3, we describe how we model human preferences expressed in likes and dislikes and translate those preferences into probability distributions over our catalog. In Section 4, we present our adaptive algorithm for making real time recommendations. In Section 5, we evaluate Seeker’s results. In Section 6, we discuss directions for future research. In Section 7, we summarize our work.
2. Related Work
Over the last few years, there has been a growing trend of exploring new interfaces beyond traditional keyword search, and in particular, visual-based search (Datta et al., 2008). In (Hadi Kiapour et al., 2015), users query relevant items by uploading real world photographs of clothing. The engine then displays results that are visually similar to the query photograph. Pinterest built a system which allowed users to hover over pins and find visually similar items in the catalog (Jing et al., 2015a). An advantage of these systems is that they help people find things using an understanding they might not be able to put into words.
Many lines of research focused on learning the relative similarities of images. They accomplish this by mapping each image to a numerical vector, so they can capture the visual similarities in Euclidean space (Lai et al., 2015; Babenko et al., 2014; Li et al., 2016; Xia et al., 2014; Zhu et al., 2016). Using the similar approach but in a scalable manner, companies have also rolled out their visual search platforms, from Google Goggles, Google Similar Images, and Amazon Flow to Microsoft (Bing) (Hu et al., 2018), Pinterest (Jing et al., 2015b), eBay (Yang et al., 2017), and Alibaba (Zhang et al., 2018).
All these methods, however, require the user to provide a photograph of the targeted item. They fail when users do not have an actual visual representation of the desired item, but instead a mental picture of it. The users themselves may not know how to properly describe their mental visualization in words. Our algorithm addresses this issue, as Seeker is able to work with any embedding representation, including visual, textual and audio.
Moreover, Seeker dynamically adjusts the search results based on interactive user feedback; all mentioned projects do not allow users to fine-tune their current-session search with additional feedback. While we use proprietary embeddings in the examples of this paper, the underlying engine can operate upon features derived from other domains (or combinations of domains) as well – customer behavior, language understanding, audio, etc.
3. Problem Formulation
3.1. The Setting
Figure 1 illustrates how Seeker is different from traditional search. The user starts with a ranked list of results and provides feedback in the form of likes or dislikes; the search engine then generates a new set of ranked results, updating the page in real-time. In this section, we define the notation to formally describe the above search process.
Assume that we have a catalog of items, out of which can be displayed. We model user feedback as a sequence of likes and dislikes over discretized timesteps . The user starts with an initial ranking of items at timestep . This initial ranking can be thought of as Seeker’s prior belief on what the user desires, can be generated from a traditional search or recommendation engine, and may incorporate diversity or business requirements.
The user interacts with the page by liking or disliking items. At each timestep, , Seeker produces a new ranked list of results, based on the feedback from . It does so by constructing a discrete probability distribution over the catalog of items at each timestep. The probability distribution represents the likelihood of an item being the user’s desired item.
We featurize each catalog item by embedding it into a vector space . Seeker requires a high correlation between human perception of similarity and distance metric in the embedded vector space. Based on the properties of the items displayed, embedding strategies described in (Le and Mikolov, 2014; Peters et al., 2018; Devlin et al., 2018; Szegedy et al., 2016) have been shown to correlate with human perception.
Seeker can be divided into three major components, as seen in Figure 2. Section 3.2 describes how we convert likes and dislikes to preference pairs and probability distributions. Section 4.1 details how we use preference pairs to estimate a target’s likelihood. Section 4.4 shows how we use probabilistic sampling to recommend items to users at each timestep.
3.2. Pairwise Comparison
Let for be the vector representations of the liked items, and for the vector representations of the disliked items. We will drop the superscript when the context is clear. Let and be the non-empty subsets of . We define as the preference pair which consists of a liked item and a disliked item from sets and respectively. We create preference pairs from all cross-pairings between the likes and dislikes.
The intuition behind preference pairs follows from our assumption that the user has some ideal item (referred to as the target) in their mind that they wish to find. Then represents the preference that the user thinks item is more similar to their desired item than item , i.e they prefer i over j given t:
[TABLE]
Equation 3.1 resolves to item being spatially closer to item than it is to item . In this paper we use the Euclidean distance to measure vector similarity, but Seeker is agnostic to the metric used.
We use preference pairs to model the probability of a catalog item being the hidden target item , featurized as . If we were to present a user with item and item , what is the probability that they chose over ? Questions of this form are known as triplets in the Machine Learning literature (Jain et al., 2016; Schroff et al., 2015). Equation 3.2 mathematically models our question:
[TABLE]
where .
Intuitively, the answer to the above triplet question should depend on how similar items and are to . As similarity and distance are equivalent in our world, the probability of preferring over becomes a function of how close and are to . According to this model (and Equation 3.2), if items and are equidistant from the target , then they are equally preferred, and the probability of choosing over is . If is the target while is infinitely far away, the probability of choosing becomes . For items in the middle we get a smooth noise model that accounts for the stochasticity in human decisions.
Our model includes a preference hyperparameter , which represents our confidence in the vector space representation:
- •
When , then \mathbb{P}\big{(}S_{ij}|t\big{)}=0.5 for all combinations of targets, likes and dislikes. This means our embeddings have no correlation with human judgment of similarity, and preferring over is as good as a fair coin flip.
- •
When , then \mathbb{P}\big{(}S_{ij}|t\big{)}=1. This removes randomness from the decision process, perfectly aligns our representation of human judgment with the metric distance, and deterministically picks the closer item.
We use for the results discussed in Section 5.
4. Item Ranking
In this section we describe how we go from preference pairs and a noise model to a ranked list of items to be displayed to the user.
4.1. Target Estimation
To keep our notation consistent, we always assume that the user prefers item to when we write . We make the further assumption that each preference pair is independent from each other. This is a simplifying assumption which serves as a good baseline (Chapelle and Li, 2011; Hill et al., 2017). In Section 6.1, we investigate ways to drop the independence assumption. Equation 4.1 represents the joint distribution likelihood of observing preferences , given target item and likes and dislikes sets and :
[TABLE]
where is defined as in Equation 3.2. The log-likelihood becomes:
[TABLE]
We do not know a priori what is the hidden target . Our goal is to find or approximate it. We note that may not be present in our catalog, and in this case our goal is to find an item as similar to as possible. In order to build a probability distribution over our catalog, we borrow ideas from (Tamuz et al., 2011). We use the same noise model, but apply it to recommend items to the user, instead of learning a metric space. For each catalog item, we compute the log-likelihood mass of that item being the target, given the user’s likes and dislikes, as shown in Algorithm 1.
4.2. Posterior Construction
Instead of presenting items according to their likelihood of being the target, we allow for the inclusion of priors into our model. Let be the prior probability of item being the actual desired target. One can compute such priors using traditional search engines, and personalize them using the user’s browsing or purchase history (Teo et al., 2016).
Given priors , the posterior probability of an item being the target is:
[TABLE]
and the log-posterior becomes:
[TABLE]
At each time step the user provides feedback causing the size of to grow. Therefore the log likelihood will eventually dominate the posterior density score. In the early stages when we have fewer likes and dislikes, our posterior belief on the target is dominated by a well founded prior. This prevents us from having to wait a long time before showing meaningful results.
4.3. Items Recommendation
We consider four different ways to display items to the user:
4.3.1. Pure Exploitation/Noiseless
The simplest approach is to sort the posteriors and recommend the top items. Theoretically, this prevents us from exploring the search space. Practically, this leads to a poor user experience with limited product diversity.
4.3.2. Pure Exploration/Random
The other extreme solution is to show random results all the time, completely ignoring the posterior densities.
4.3.3. Epsilon-greedy
Another approach is to randomize some of the results while leaving the others untouched, as in Algorithm 2. We rank items by their posterior densities, and replace each item with a random item with probability . See (Bubeck et al., 2012) for a detailed study of -greedy algorithms.
4.4. Boltzmann Exploration
The fourth method to recommend items involves sampling without replacement according to the item’s posterior densities. Let be a score associated with item . A popular way to generate a discrete distribution over the items is by using the exponential weighing scheme, known as the softmax or Boltzmann equation:
[TABLE]
Here, is our belief probability that item is the true target. Even though is unconstrained in , common values are and , the latter resulting in polynomial weighing (Szepesvári, 2010; Jang et al., 2017). Note that if the items were equally spaced, sampling from the discrete distribution is asymptotically equivalent to sampling from the hidden continuous distribution, as we show in Appendix A.
Sampling without replacement when is large can prove to be very slow. When and are large, normalizing our posterior densities can lead to precision issues with sampling. We can overcome this problem by using the Gumbel-Max trick (Maddison et al., 2014), which shows that adding standard Gumbel noise to and taking the is equivalent to sampling according to Boltzmann (Equation 4.5):
[TABLE]
We sketch the proof for completeness. Let . By the additive property, , with probability density function (PDF):
[TABLE]
and cumulative distribution function (CDF):
[TABLE]
Proof.
Define by the probability that is the largest among all . We have:
[TABLE]
∎
Since the added Gumbel noises are independent, showing the items with the highest scores is equivalent to sampling items without replacement from Equation 4.5.
To balance exploration and exploitation, one resorts to annealing (Aarts and Korst, 1988), with an appropriately tuned sequence of learning rate parameters (aka inverse temperature) for each timestep :
[TABLE]
Note that recovers the pure exploration mode, and recovers the pure exploitation mode. Varying allows us to trade-off exploitation and exploration.
On the other hand, similarly to the proof above, we have
[TABLE]
Note that, by dividing by , we establish:
[TABLE]
Sampling from and taking the maximum, as in Equation 4.4, is similar to Thompson Sampling in a bandit setting (Thompson, 1933; Russo et al., 2018). The crucial difference (and drawback) is that the Gumbel method doesn’t take into account the uncertainty of the reward estimates.
Finding the right schedule for can be very difficult in practice (Vermorel and Mohri, 2005). In (Cesa-Bianchi et al., 2017), the authors provide an annealing schedule for in a standard stochastic multi-armed bandit setting, guaranteeing sublinear regret. Let be the number of times arm has been played up to timestep . For some constant , they set , and sample according to:
[TABLE]
Equation 4.13 decouples the learning rates of the individual items, and factors-in the uncertainty of the reward estimates. We now have a proper way to sample from a Boltzmann, with convergence guarantees. Even though our setting is not exactly the same as (Cesa-Bianchi et al., 2017), we borrow parts of their sampling strategy to recommend items to the user. As detailed in the theoretical justifications of Appendix B, we recommend setting for .
As a user can repeatedly interact with the same item, we treat as the number of times a user interacts with item . It starts with and is incremented with every like or dislike to item . Putting it all together, we obtain our final Boltzmann sampling algorithm (Algorithm 3).
5. Evaluation
As our experiments require human judgments, there exist no such ground truth datasets for validation. Instead we propose an experimental framework with a human in the loop that simulates the Seeker experience and generates quantifiable metrics. The evaluation study serves as a benchmark for future sequential search algorithms.
5.1. Experimental Setup
Seeker assumes that the user has a mental image of a target item they cannot easily express in words. When accessing Seeker, the user is presented with a subset of items to interact with, using like or dislike clicks. At any moment, the user can expand the catalog listing view by clicking on “Explore More”. Our experimental setting mimics this initial user experience.
A single experimental session involves the following: A user is presented with a target item . This target item is an explicit simulation of the user’s hidden target. At each timestep, we present the user with a grid of items. The user’s goal is to find the target item through a series of feedbacks. At each timestep, they may like, dislike, or remove a previously liked/disliked item. Upon receiving user feedback, we recommend new items to view in the next timestep. The session goes on for timesteps. If the user can find the target within the items, they may stop playing. Otherwise, they try to get as close to the target item as possible based on their perception of similarity. For our experiments, we set , , , , , and used an uninformative prior.
We enlisted volunteers to participate in the experiment defined above, and collected 358 (roughly 90 per sampling algorithm) unique sessions. The target and exploration algorithm for each session was selected uniformly at random. Users were instructed to like and dislike items assuming that they wanted to purchase the target item. Users had no prior knowledge of the selected catalog or algorithm. At each timestep, we invoke Seeker to generate a posterior distribution over the catalog, according to Equation 4.4. This distribution enforces a natural ranking on the items. We monitor the normalized rank of the target item at each timestep. The normalized rank is defined as the rank of the target item divided by the size of the catalog. A target with a normalized rank of means it has a final rank of \big{\lceil}0.1*N\big{\rceil}.
5.2. Experimental Results
Seeker aims at helping the user quickly zoom-in on the desired target item. A typical metric for such recommender systems is recall at k (Herlocker et al., 2004). As we have only one target of interest, we measure how close our recommendations are to target . We can do that using the target’s normalized rank. For a given session , let be the lowest normalized rank attained by in all timesteps. We define recall @ as the percentage of sessions with . For example, a recall of @ means that of sessions achieved a normalized ranking of or less.
Figure 3 plots recall @ for our sampling strategies. We plot up to , as the user is unlikely to scroll past higher percentiles. Boltzmann exploration achieves the highest recall, dominating all other strategies. Noiseless and Greedy perform similarly, outperforming random at lower recalls. Random improves at higher recalls due to its higher degree of exploration, where the target gets ranked high by pure chance.
Figure 4 plots the convergence time of our sampling strategies. From the user’s perspective, this reflects how long it takes to find a reasonably close approximation of the target item. We report the mean number of steps it takes for the rank of the target item to drop below a given recall cutoff . Boltzmann exploration consistently outperforms the other strategies. Greedy and Noiseless surpass Random, but their advantage diminishes at higher rank cutoffs.
5.3. Discussion
We would like to point out that the experimental setup described above is not restrictive. Although we do present a window of items with which the user interacts, the user can expand the window size by explicitly clicking on an “Explore More” option. Once in the expanded view, scrolling down past the last displayed item triggers the display of additional items in an infinite scroll mode covering the whole catalog items. Since we maintain an explicit ranking on all items, this mode of experimentation merges naturally with our algorithm.
Infinite scroll, such as home feeds on social media websites like Facebook and twitter, may create a better user experience and allow for the user to browse quickly. But when the target is explicit, such an infinite scroll feature makes our experimental framework trivial – the user can just scroll until they find the target. This prevents us from gaining insight about convergence, which explains why we limited our study to the windowed-version of the application. We consider our experimental setup a restrictive experience in terms of user experience.
Additionally, the catalog contains multiple similar items. This leads to a large number of identical feature-vector representations, making it challenging to surface the target item among items in just timesteps. Hence, it is likely that the Section 5.2 experimental results are pessimistic, as users are constrained from browsing the search space efficiently. Nevertheless, as the top-most items get the most visibility, Seeker’s ability to quickly zoom-in to the item of interest remains crucial.
On occasions, the Seeker interface produces pages with very similar items ranked closely, leading to lack of exploration. Two items which look mostly identical are likely to have similar vector representation and hence may appear adjacent to each other (Nassif et al., 2016). In a deterministic setting, this would have resulted in a page full of very similar items, and prevented the user from pivoting to other parts of the catalog. Although Boltzmann exploration offers a principled remedy, depending on the use case, one may want to model additional uncertainty into the user actions in the early stages of the interactions. As a remedy, we can modify the posteriors by adding noise, using submodular functions (Chen et al., 2017), or determinental point processes (Affandi et al., 2012).
The constant in Algorithm 3 is borrowed from the work in (Cesa-Bianchi et al., 2017) which uses the non-contextual stochastic multi-armed bandit setting. Under their setting is a reasonable estimate to bound variance. However, items in our search space have features that are shared and correlated. Our sampling strategy currently does not take into account this covariance properly when making recommendations. We leave augmenting our sampling algorithm with a new variance bound for future work.
6. Future Work
We are considering improving this work on multiple fronts.
6.1. Bipartite Preference Model
When constructing the preferences from user feedback in Section 3.2, we treat the likes and dislikes independently, valuing them equally. However, intuitively, if we put more emphasis on likes, we may be able to find the target faster. Likes are less ambiguous than dislikes: likes have a clearer implication when treated in isolation, while dislikes usually require context to be a useful learning signal. In fact, we empirically observe that the likes are more clustered with one another than the dislikes to themselves.
We can mathematically model the emphasis of likes over dislikes by assuming the likes are independent of each other but the dislikes are conditioned on the likes. Conditional dependencies can be expressed in the form of a bipartite graph as shown in Figure 5.
Using Bayes rule, we can represent Figure 5 as:
[TABLE]
Here we assume that:
[TABLE]
and
[TABLE]
with .
We interpret the model in the following way. Equation 6.2 conveys that the probability of liking item is proportional to how similar is to target . Equation 6.3 conveys that the probability of disliking item is proportional to its relative distance to the target as compared to the relative distance between the target and liked items . One can quantify the distance between and in different ways. Here we propose to reflect the customer’s gradual approach towards the target. We leave the evaluation of this model to future work.
6.2. Incorporating Additional Feedback
So far, the only form of feedback that the user provides is in the form of likes and dislikes. Consider the situation where the user provides feedback in the form of a text or utterance. This transitions us into the guided conversational search paradigm and we could incorporate some of the strategies described in (Huang et al., 2018; Wen et al., 2016; Shah et al., 2018).
Assume we have a technique (like LSTM to create word embeddings) to map a spoken feedback into a vector . We want to incorporate this feedback into the model. Equation 4.3 becomes:
[TABLE]
where
[TABLE]
To estimate , we change Equation 3.2 to:
[TABLE]
Although we used text/speech as an example, the additional feedback embedding can originate from an arbitrary source. Similarly, we can incorporate extra feedback into the bipartite preference model of Section 6.1.
6.3. Personalized Recommendations
Another possible direction is to personalize Seeker. Let be an embedding vector for each user. The dataset now comes in the form of quadruplets , where each user has a target and pairs of likes and dislikes.
To personalize, we define a synthetic embedding kernel , where denotes an item. For example, we can use element-wise product:
[TABLE]
Now, we can substitute this kernel into our modeling formulas, replacing any item with personalized embedding .
7. Conclusion
This paper presents Seeker, an interactive, real-time search system. Seeker allows users to search for products even when it is difficult to describe them in words. Unlike embedding-based search engines, this method does not require a preknown representation of the desired item. With interactive binary feedback, our system learns to dynamically refine search results from the user’s preferences in real time. Our evaluation results show that our Boltzmann exploration method allows users to find their products more quickly and with greater regularity compared to alternative exploration strategies.
Acknowledgements.
We gratefully acknowledge Kevin Jamieson and Lalit Jain for sharing their experience of designing adaptive algorithms. We would like to thank Miguel Jimenez Gomez, Xiaopeng Zhang and Andrea Matsunaga for their software development expertise. In addition, we thank the volunteers for their help in performing the user study. Finally, we thank Amazon for the opportunity to conduct this research project.
Appendix A Asymptotic Sampling Equivalence
Given as a compact (i.e. closed and bounded) subset of . Let be a continuous probability density function. Let the set consist of points ’s that are equally spaced on in the grid-like manner such that . Consider the following two ways of sampling over :
- (1)
Each time, sample from on , and choose if and only if , where the metric is usually the Euclidean metric. We assume argmin is unique. 2. (2)
Each time, sample each from the discrete distribution on so that is chosen with probability .
Define:
[TABLE]
Prove that .
Proof.
Partition into disjoint regions in the grid-like manner such that for each for each ,
[TABLE]
Since the points ’s are equally spaced on , the regions ’s all have the same measure: , where is the (fixed) Lebesgue measure of . So
[TABLE]
Here the first equation holds by the definition of ’s, and the second by the Mean Value Theorem (MVT) for some . On the other hand,
[TABLE]
where the second equation holds by the additivity of integral, and the last equation holds by the MVT for some .
Since is continuous on the compact subset of , there is an upper bound such that . Moreover by the Heine - Cantor theorem, is uniformly continuous on .
Now fix , . By uniform continuity of on , there exists such that for all with , we have where
[TABLE]
Because the regions ’s are partitioned in the grid-like manner, there exists such that for all , the diameter of each is smaller than . This implies for all and for all . Hence for all , we have
[TABLE]
which implies:
[TABLE]
Therefore for each ,
[TABLE]
This implies that for all ,
[TABLE]
This ends the proof. ∎
Appendix B Setting Parameter
Given an item, a user can like or dislike it. Our rewards are thus binary, making the reward distribution -subgaussian with variance factor . We follow (Cesa-Bianchi et al., 2017)’s Theorem 3 computations with a standard Gumbel noise (see Equations 4.7 and 4.8). We do not introduce extra variable in the proof of Lemma 3, setting . We thus bind the regret as:
[TABLE]
Here the finite horizon is the final timestep, and the gap is the difference between the mean reward of the optimal item, and the mean reward of item .
Although may potentially be specified, is unknown. To obtain a small regret, the authors recommend setting . But one can easily see that choosing leads to an even smaller regret. We therefor set .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Aarts and Korst (1988) Emile Aarts and Jan Korst. 1988. Simulated annealing and Boltzmann machines. (1988).
- 3Affandi et al . (2012) Raja Hafiz Affandi, Alex Kulesza, and Emily B. Fox. 2012. Markov Determinantal Point Processes. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI) .
- 4Babenko et al . (2014) Artem Babenko, Anton Slesarev, Alexander Chigorin, and Victor S. Lempitsky. 2014. Neural Codes for Image Retrieval. In The European Conference on Computer Vision (ECCV) . Zurich, Switzerland, 584–599.
- 5Bubeck et al . (2012) Sébastien Bubeck, Nicolo Cesa-Bianchi, et al . 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning 5, 1 (2012), 1–122.
- 6Cesa-Bianchi et al . (2017) Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, and Gergely Neu. 2017. Boltzmann exploration done right. In Advances in Neural Information Processing Systems . 6284–6293.
- 7Chapelle and Li (2011) Olivier Chapelle and Lihong Li. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems . 2249–2257.
- 8Chen et al . (2017) Lin Chen, Andreas Krause, and Amin Karbasi. 2017. Interactive Submodular Bandit. In Advances in Neural Information Processing Systems 30 (NIPS) . 141–152.
