Enhancing the long-term performance of recommender system

Leyang Xue; Peng Zhang; An Zeng

arXiv:1904.00672·physics.soc-ph·July 2, 2019

Enhancing the long-term performance of recommender system

Leyang Xue, Peng Zhang, An Zeng

PDF

TL;DR

This paper introduces ARL, a novel method to improve long-term recommendation accuracy in recommender systems by balancing diversity and user preferences, validated through a network evolution model.

Contribution

The paper proposes the ARL approach to enhance long-term recommendation performance and demonstrates its robustness and effectiveness through simulation.

Findings

01

Long-term recommendation accuracy is significantly improved.

02

Diversity of items in the system is maintained.

03

An optimal parameter n* balances diversity and user preferences.

Abstract

Recommender system is a critically important tool in online commercial system and provide users with personalized recommendation on items. So far, numerous recommendation algorithms have been made to further improve the recommendation performance in a single-step recommendation, while the long-term recommendation performance is neglected. In this paper, we proposed an approach called Adjustment of Recommendation List (ARL) to enhance the long-term recommendation accuracy. In order to observe the long-term accuracy, we developed an evolution model of network to simulate the interaction between the recommender system and user's behaviour. The result shows that not only long-term recommendation accuracy can be enhanced significantly but the diversity of item in online system maintains healthy. Notably, an optimal parameter n* of ARL existed in long-term recommendation, indicating that…

Tables3

Table 1. Table 1: The detailed statistical characteristics of the seven experimental dataset

Datasets	$N_{u s e r}$	$N_{i t e m}$	$N_{e d g e}$	$< k_{u s e r} >$	$< k_{i t e m} >$	Sparsity
Delicious	868	2,835	4,812	5.54	1.70	$0.20 \times 10^{- 2}$
Amazon	900	3,868	7,623	8.47 a	1.97	$0.22 \times 10^{- 2}$
Stack Overflow	1,199	1,977	7,886	6.58	3.99	$0.33 \times 10^{- 2}$
Epinions	1,199	2,978	48,435	40.40	16.26	$1.36 \times 10^{- 2}$
Netflix	1,014	1,977	35,821	35.33	18.12	$1.79 \times 10^{- 2}$
Douban	846	2,997	66,647	78.78	22.24	$2.63 \times 10^{- 2}$
Movielens	943	1,682	55,375	58.72	32.92	$3.49 \times 10^{- 2}$

Table 2. Table 2: The statistical characteristics of experimental dataset

Datasets	$n^{*}$	Sparsity
Delicious	20	$0.20 \times 10^{- 2}$
Amazon	30	$0.22 \times 10^{- 2}$
Stack Overflow	150	$0.33 \times 10^{- 2}$
Epinions	470	$1.36 \times 10^{- 2}$
Netflix	270	$1.79 \times 10^{- 2}$
Douban	400	$2.63 \times 10^{- 2}$
Movielens	320	$3.49 \times 10^{- 2}$

Table 3. Table 3: The Spearman correlation between n ∗ superscript 𝑛 n^{*} and sparsity

Spearman correlation	$n^{*}$	Sparsity
$n^{*}$	1	0.75
Sparsity	0.75	1

Equations10

R L^{^{'}} = A R L (R L, n)

R L^{^{'}} = A R L (R L, n)

R S_{uα} = \frac{l _{uα}}{L _{u}},

R S_{uα} = \frac{l _{uα}}{L _{u}},

G = \frac{2 \sum _{α = 1}^{N} α k _{α}}{N \sum _{α = 1}^{N} k _{α}} - \frac{N + 1}{N},

G = \frac{2 \sum _{α = 1}^{N} α k _{α}}{N \sum _{α = 1}^{N} k _{α}} - \frac{N + 1}{N},

S_{α β}^{J a cc a r d} = \frac{∣ Γ ( α ) \cap Γ ( β ) ∣}{∣ Γ ( α ) \cup Γ ( β ) ∣} .

S_{α β}^{J a cc a r d} = \frac{∣ Γ ( α ) \cap Γ ( β ) ∣}{∣ Γ ( α ) \cup Γ ( β ) ∣} .

S p a r s i t y = \frac{N _{l ink}}{N _{i t e m} * N _{u ser}}

S p a r s i t y = \frac{N _{l ink}}{N _{i t e m} * N _{u ser}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Enhancing the long-term performance of recommender system

Leyang Xue

Peng Zhang

[email protected]

An Zeng

School of Science, Beijing University of Posts and Telecommunications, Beijing 100876, P.R. China.

School of Systems Science, Beijing Normal University, Beijing 100875, P.R. China.

Abstract

Recommender system is a critically important tool in online commercial system and provide users with personalized recommendation on items. So far, numerous recommendation algorithms have been made to further improve the recommendation performance in a single-step recommendation, while the long-term recommendation performance is neglected. In this paper, we proposed an approach called Adjustment of Recommendation List (ARL) to enhance the long-term recommendation accuracy. In order to observe the long-term accuracy, we developed an evolution model of network to simulate the interaction between the recommender system and user’s behaviour. The result shows that not only long-term recommendation accuracy can be enhanced significantly but the diversity of item in online system maintains healthy. Notably, an optimal parameter $n^{*}$ of ARL existed in long-term recommendation, indicating that there is a trade-off between keeping diversity of item and user’s preference to maximize the long-term recommendation accuracy. Finally, we confirmed that the optimal parameter $n^{*}$ is stable during evolving network, which reveals the robustness of ARL method.

keywords:

Long-term, Recommender system, Evolution model, Robustness

††journal: Journal of LaTeX Templates

1 Introduction

In the past decade, the recommender system has become an essential tool for solving the information overload problem with the rapid development of internet economy[1, 2]. Recommender systems provide personalized suggestions of the most relevant items for users by analyzing their historical interaction and user’s profile[3, 4], such as Amazon recommender[5], Tmall and so on. A number of algorithms based on various ideas and concepts have been developed to personalize the online store for each customer. These recommendation algorithms include context-based analysis[6, 7], collaborative filtering[8, 9], matrix decomposition[10, 11], deep learning[12, 13] and so on. While the most of last-mentioned recommendation algorithms act on data with ratings, unary data without rating can be dealt by a class of network-based recommendation algorithms[14] that represent input data with a bipartite network where users are linked with items that they have selected [15]. Recently, considerable attention has been paid to network-based recommendation algorithms, such as mass diffusion (MD)[16], heat conduction(HC)[17], hybridization of mass diffusion and heat conduction[18] and numerous extensions based on them[19, 20, 21].

These methods perform fairly well in both accuracy and diversity in a single-step recommendation, which actually shows the short-term performance. This is one of the most important goals but not the ultimate goal for designing the recommendation algorithms[22]. The original intent to use the recommender system for those internet companies is that they can yield considerable profits by making use of the niche items, because niche items enjoy higher profit margin compared to the small profit margin determined by a more competitive market of popular items[23]. Hence, the ultimate goal is to broaden the scope of user’s interest and recommend some niche items that are rarely purchased. Besides, user’s preference changes over time as they get knowledge, maturity and experience. The online network evolves over time. Thus, a recommendation algorithm that perform very well in short-term recommendation cannot guarantee the long-term performance. Therefore, as the successive recommendation, one has to investigate the impact of algorithms on the online system and observe the long-term performance.

Lately, some scholars begin to study the long-term performance of recommendation algorithms and find that their accuracy decreases with time if the evolution of online network fully depends on the recommendation[4, 22]. Many existing papers[24, 25, 26] have pointed out those methods that have high accuracy in short-term recommendation tend to recommend the popular items. As online network evolves with such recommendation algorithm, most of edges are linked in small number of popular items. The extreme popular items arise and receive substantial attention in the next future, which makes the network evolve to an unhealthy state and narrows the user’s choice. In this case, the long-term accuracy of recommendation algorithms that favor popular items would reduce, which is in general an intractable problem. So far, few studies has been done to futher improve the long-term performance of recommendation. Hu et al[4] find that recommendation diversity is essential to keep a high long-term accuracy by increasing the length of recommendation list. Shi et al[26] propose a personlized recommender based on user’s preference and get better trade-off between short-term and long-term performance of recommendation.

The major contribution of this work is to develop an evolution model of bipartite network and present a methodology called Adjustment of Recommendation List(ARL) aiming to enhance the long-term performance of recommender system. The proposed approaches are applied on the extensively analyzed dataset, enabling us to firmly compare our method with those reported algorithms. Through the empirical study, we find that the long-term recommendation accuracy can be enhanced significantly without sacrificing the short-time accuracy. Interestingly, similar to some existing literature, the diversity in long-term recommendation is also improved. By tuning the parameter incorporated into ARL, we achieve better long-term performance of recommendation algorithms. This might be an indication of there is trade-off betweeen adding diversity into evolving network and remaining original user’s preference. In addition, the robustness of ARL has been confirmed.

2 Methods

An online commercial system can be modeled by user-item bipartite network where users and items are represented as nodes and an edge denotes that an user has selected an item. The bipartite network can be represented by an ${M\times N}$ adjacency matrix ( $M$ users and $N$ items) where the element $A_{i\alpha}=1$ if an user $i$ has selected an item $\alpha$ and $A_{i\alpha}=0$ otherwise. In this paper, we employed mass diffusion(MD)[16] and hybridization(Hyb) [18] of both mass diffusion and heat conduction as recommendation algorithms to study the long-term recommendation.

2.1 Evolution model

In order to observe the long-term performance, we designed an evolution model in which the evolution of online network is drived by recommendation algorithms. In the model, we assumed that user only focused on items ranked higher in recommendation list and would randomly take an item from the recommendation list of length L. In other words, users relied entirely on the recommendation when they selected some items. Thus, a new link between users and selected items was added into bipartite network. At the same time, we randomly deleted a link to keep the network size fixed. The recommendation list in next step would be generated based on the updated network with these new links.

In the experiment, the dataset was randomly divided into training set ( $E_{T}$ ) and probe set ( $E_{P}$ ) according to the ratio of 90% to 10%. The initial network consisted of training set ( $E_{T}$ ). In one macro-step of simulation, the recommendation list of each user was generated by recommendation algorithm (MD). In order to directly observe the essence of long-term recommendation accuracy as soon as possible and save computation, we set the length of recommendation list (L) as 1 so that the highest-scoring item in the recommendation list can be selected by users, namely users would take top-1 recommendation. Further, the link between the user and the selected item was made and added into the training set ( $E_{T}$ ). Meanwhile, we randomly deleted a link from the user’s historical record to keep the users degree fixed. This was equivalent to a so-called breaking-rewiring process. After that, the new training set ( $E_{T1}$ ) was obtained and used to generate the recommendation list for the next macro step. We kept the probe set unchanged during the network evolution so as to enable the long-term accuracy to convergence. In each macro step, the probe set would be used to test the recommendation accuracy. Obviously, the long-term recommendation accuracy was the cumulative result of network evolution drived by the recommendation algorithms. The diagram of whole evolution process was described in Fig. 1 (a).

2.2 Adjustment of Recommendation List

The recommendation algortithms that tend to recommend the popular item for user perform well in short-term[24, 25, 26], e.g. mass diffusion[16], collaborative filtering[8, 9]. Through successive recommendations, these methods reinforce the popularity of hot items, which leads to that the system is dominated by some extremely popular items and further the long-term accuracy decreases.

Therefore, we proposed a method called Adjustment of Recommendation List (ARL) to enhance the long-term recommendation performance. The main idea of ARL is to diversify personalized recommendation list to broaden the user’s choose, which will transfer some of user’s attention from the extremely popular items to relatively cold items. There are many ways to implement it, such as the topic diversification approach[27]. For simplicity, top-1 and top-n items uncollected by users are exchanged, which forms a new recommendation list for users. A clear schematic could be seen in Fig. 1 (b). Here, we represented the exchange position as a parameter $n$ of ARL. A general mathematical expression could be seen in Eq. 1. By tuning the parameter $n$ and combining with the network evolution model, we could control the diversity introduced into evolving network to observe the long-term performance of recommendation algorithms.

[TABLE]

where the $n$ denote the exchange position and $RL$ represent the recommendation list. While the $RL^{{}^{\prime}}$ is the new recommendation list generated by ARL. $n\in[1,N_{L}]$ , $N_{L}$ refers to the the number of item uncollected by users.

The ARL and evolution model are essentially different. The evolution model was designed to study the adverse effect of long-term recommendation, making the recommendation accuracy significantly decrease after a number of iterative recommendations. In practice, we assumed that each user would select the top-1 item in the recommendation list and made a link between the user and the selected item. The recommendation list in next step would be generated based on the updated network with these new links. In contrast, the ARL algorithm is a method aiming for improving the recommendation. Before user making their selection, ARL algorithm re-ordered the recommendation list generated by a specific recommendation algorithm by swapping top-1 item and top-n item. This procedure would change the top-1 item in the recommendation list, further altering the item that user would select. If the recommendation list reordering is designed well, the long-term recommendation accuracy could be improved.

3 Experiments

3.1 Data

Seven datasets were used to conducted experiment, including Movielens[28] and Netflix[29], Epinions[30], Stack[31], Amazon111https://www.amazon.com/, Delicious222http://www.thedeliciousgroup.com/ and Douban333https://www.douban.com/. Movielens and Netflix are similar and contain the user’s rating on movies. The scale of score is from 1 (i.e.worst) to 5 (i.e.best). In order to create an unweight bipartite network, we used score at least 4 as an efficient link between users and items. The movielens dataset includes 1682 users and 943 items and the resulting network contains 55375 links. The netlifx network contains 1014 users and 1977 items within 35821 links whose ratings higher than 3. Epinions is an online product rating site where users and items can be obtained. Stack Overflow is the main question and answer website of the Stack Exchange Network where nodes reprenst users and posts and an edge denotes that an user has marked a post as a favorite. Douban is a website to show the comment or remark of movies, videos and books provided by users. Amazon, Delicious and Epinions datasets is acquired by crawing website. We extracted the subsets from five original datasets (Epinions, Stack, Amazon, Delicious, Douban) to reduce the computation. A detailed information can be seen in Table 1. In the simulation, we divided randomly the dataset into a training set $E_{T}$ and a probe set $E_{P}$ . The training set contained of 90% edges, the rest of edges constituted the probe set. Obviously, $E_{T}\bigcap E_{P}=\o{}$ and $E_{T}\bigcup E_{P}=E$ . All simulation results were obtained by averaging over ten independtent experiments.

3.2 Metric

3.2.1 Ranking Score (RS)

Ranking Score[18] measures the accuracy of recommendation algorithms to generate a good ordering of items that matches the user’s preference. For a target user, a recommendation list is produced by recommendation methods according user’s historical record. For each item in the probe set, we can measure the rank of item in the recommendation list. A high accuracy recommendation algorithm is expected to give the item in probe set a higher rank, which leads to a small ranking score. Ranking score of a target user is obtained by averaging over all entries in the probe set to quantify the recommendation accuracy of method. A formula of ranking score is as follow:

[TABLE]

where $l_{u\alpha}$ is the rank of item $\alpha$ in the recommendation list of user $u$ . $L_{u}$ denotes the number of uncollected items, namely the length of recommendation list in offline testing. The ranking score of the whole system is obtained by averageing $RS_{u\alpha}$ over all users. Obviously, $RS\in(0,1)$ . The smaller the ranking score, the higher the recommendation accuracy of algorithms. In this work, there is a case where user-item pairs in the probe set appear in the evolving network, resource score of items equal to zero. In order to calculate reasonably the ranking score, these items are put into the recommendation list and are ranked last, which can be regarded as those items selected cannot be recommended in current macro step. In fact, these user-item pairs may be removed from evolving network in next macro step. In actual calculation, the formula of ranking score as follow: $RS_{u}=\sum_{\alpha\in probe_{u}}\frac{{l}^{\prime}_{u\alpha}}{L}$ , where ${l}^{\prime}_{u\alpha}$ is the rank of probe-set item $\alpha$ in the recommendation list, $L$ means the number of all items in system. The value obtained from $RS_{u}$ underestimated original RS, but it doesn’t matter for the comparison of long-term accuracy between mass diffusion and ARL.

3.2.2 Gini coefficient

We exployed the Gini index[32] to measure the heterogeneity of item popularity distribution in long-term recommendation. The Gini index is originally proposed to assess income inequalities of inhabitants or families in a country[33], it has been widely used to measure the dispersion in other fields[34, 35, 36, 37]. The following equation can be applied to calculate the Gini index:

[TABLE]

where $k_{\alpha}$ denotes the degree of item $\alpha$ , representing the popularity of item $\alpha$ in the system. The $\alpha=1$ to $N$ , that are indexed in non-decreasing order ( $k_{\alpha}\leq k_{\alpha+1}$ ). The $N$ is the number of items. Hence, the value of Gini index ranges between zero and one , which corresponds to the equal popularity of item (i.e. every item has the same degree) and completely unequal popularity of item (i.e. only an item is selected by users, while items else have zero degree) respectively. A higher Gini cofficient indicates a more heterogeneous distribution, and vice versa. In some sense, the Gini index can be regarded as an health indicator of the whole system. For example, all edges are linked to one item and other items have no links when Gini index equals to one. In this extreme case, the diversity of item in the system is very poor and there are no valid and valuable information that be used to make recommendations. In addition, the heterogeneity of item popularity distribution reflects the diversity of item in system.

3.2.3 Jaccard index

Jaccard index[38] is proposed to measure the similarity between finite sample sets more than one hundred years ago. Here, we use Jaccard index to calculate the similarity between top-1 and top-n item. The formula of Jacard index is as follow:

[TABLE]

For each item $\alpha$ , $\Gamma(\alpha)$ denotes the set of neighboor of item $\alpha$ , namely the set of user that have selected the item $\alpha$ . $|\Gamma(\alpha)\cap\Gamma(\beta)|$ represent the common neighoors between item $\alpha$ and $\beta$ . The value of Jaccard index ranges from 0 to 1.

4 Results

In this section, we conducted a set of experiments to examine the long-term recommendation performance of ARL. The ranking score and Gini index was used to measure the recommendation accuracy and the heterogeneity of item popularity distribution in the system respectively. We employed the Jaccard index to evaluate the similarity between top-1 and top- $n$ item.

4.1 The long-term recommendation performance of ARL

The parameter $n$ has a significant impact for the long-term recommendation performance. For instance, the similarity between the top-1 and top-n items will be lower if the value of $n$ is very large, further leading to introducing more diversity into the evolving network. At the same time, this also results in loss of user’s preference, which still reduces the long-term recommendation accuracy. Therefore, it is important for long-term recommendation to choose the appropriate parameter.

We showed the recommendation accuracy of ARL conducted on Movielens and Netflix for different n under different macro-step in Fig. 2(a)(b). Note that the ARL(1) degenerates to the original mass diffusion. We found that the curve of four algorithms tended to reach a stable after 250 macro-step in the Fig. 2(a)(b). Therefore, we regarded the average of RS from 250 to 500 macro-step as long-term recommendation accuracy. In the Fig. 2(a), one could see that ARL(300) has lowest accuracy loss in long-term recommendation. Compared with the mass diffusion, ARL(300) has improved the long-term recommendation accuracy of 38%, which confirmes that ARL method we proposed is effective (More results could be seen Fig.6-10 (a), Supplementary). However, the improvement of long-term accuracy for ARL(20) and ARL(1000) are not obvious although they can enhance the long-term recommendation accuracy. A possible explaination can be seen in Fig. 2(c)(d). As expected, the similarity of items between the top-1 and top-20 are relative high, more relevant items can be recommended for users with the network evolution. Meanwhile, the top-20 item is also very popular, indicating that less diversity is introduced into the evolving netwotk. The joint effect of two factors determine that most links in the resultant network are still connected to the popular items (this can also be confirmed in Fig. 3(a)), leading to the lower improvement of long-term recommendation accuracy. While the similarity between the top-1 and top-1000 item and popularity of top-1000 item are extremely low, which reveals that too many irrelevant niche items are involved in evolving network and make the resultant network contain less valuable information about user’s real preference. This might explain that the enhancement of ARL(1000) in the long-term recommendation is not obvious. Therefore, it is essential for keeping a high long-term recommendation accuracy to both keep a certain diversity and remain user’s preference. The same result can also be proved in Fig. 2(b).

We studied the health state of the system in the long-term recommendation. The Gini index was employed to measure the heterogeneity of item popularity distribution during the network evolution. We showed the Gini index of ARL for different parameters under different macro-step in Fig. 3(a)(b). The Gini index of MD and ARL(20) increase gradually with the macro-step and finally reache stable, suggesting the item popularity become more heterogeneous with successive recommendation. Moreover, the $ARL(20)$ still result in the rise of extremely popular item and reduce item diversity in long-term recommendation. However, the Gini index of $ARL(1000)$ dramatically decreases with the macro-step and eventually reaches values lower than those produced by other algotithms, which indicate that more links are connected to the niche items with network evolution. Interestingly, the Gini index of $ARL(300)$ keeps the same as the short-term. This might be an important indication to matain the higher accuracy in long-term recommendation. A reasonable explanation supported in Fig. 2(c)(d) is that there is a trade-off between introduction of diverse items and retention of user’s perference to enhance the long-term recommendation performance. The diversity of item in different online systems could be improved by ARL (Fig.6-10 (b), Supplementary)

4.2 The long-term recommendation accuracy of hybrid recommendation method

The hybrid method of mass diffution and heat conduction[7] was employed to observe the long-term recommendation accuracy. Through tuning the parameter $\lambda$ , the optimal recommendation accuracy could be obtained. We mainly focused on how the optimal parameter ( $\lambda^{*}$ ) changes in long-term recommendation, showing the RS of both 1 step and 800 steps in Fig. 3(c) which corresponded to the short-term and long-term recommendation accuracy respectively. The $\lambda^{*}$ is 0.8 when the short-term recommendation accuracy achieve the maximum, whereas the $\lambda^{*}$ is 1 in long-term recommendation. With the network evolution, the $\lambda^{*}$ shifts to a larger value corresponding to the heat conduction method. This is a natural result because the heat conduction tends to recommend the niche item for users. The analysed result again confirms that the diversity of item is essential to enhance the long-term recommendation accuracy.

4.3 The optimal parameter of ARL

In order to maximize the long-term recommendation accuracy, we performed the experiment for different parameters $n$ with a step of 50 from 0 to 1000. The $n$ = 1 denotes that the result equals to the long-term recommendation accuracy of mass diffusion. For different parameters $n$ , the recommendation accuracy reachs stable after 450 steps. Therefore, we regarded the average of accuracy from 450 to 500 steps as the long-term recommendation accuracy. The scatter plot of long-term recommendation accuracy of ARL as a function of $n$ could be seen in Fig. 4. We found that the value of RS decreased sharply with $n$ and reach the lowest at 320, then it slowly increased in Fig. 4 (a). This suggested that ARL(320) can maximize the long-term recommendation accuracy according to current resolution of $n$ , which can be interpreted as the trade-off between the introduction of item diversity and retention of user’s preference information. At the optimal $n^{*}$ , the long-term recommendation accuracy has been enhanced siginificantly compared with the mass diffusion algorithm. Except for the $n^{*}$ , others parameter around the $n^{*}$ still maintains relatively higher accuracy (see those points contained by rectangle in Fig. 4(b)), confirming that the ARL method we proposed is very robust. The same conclusion can be obtained in Fig. 4 (b). The optimal parameter of ARL is also analyzed on other datasets (Fig.6-10 (c), Supplementary).

We finally showed the optimal parameter $n^{*}$ to maximize the accuracy in different macro steps. As is shownin the Fig. 5(a)(b), the parameter $h^{*}$ gradually rises with network evolution. Then, it tends to fluctuate around the gray line that respresents the value of optimal parameter ( $n^{*}$ ) in long-term recommendation. The result suggests that less diversity should be introducted into the evolving network to maximize the accuracy in the short-term recommendation. With the increasement of macro step, more diversity should be added into the evolving network. Besides, the optimal parameter $n^{*}$ keeps within a certain range for different macro steps (see the two dotted line in Fig. 5(a)(b)), which make the ARL maintain the higher recommendation accuracy for different macro steps by fixing a parameter, For example, the optimal parameter $n^{*}$ under different macro steps is around 270 on Netflix dataset. Thus, ARL(270) can achieve more higher recommendation accuracy in whole evolving network. This indicate that the optimal parameter of ARL has strong stability. These results have been confirmed on different datasets (Fig.6-10 (d), Supplementary). As the ARL is applied on different data sets, we find the $n^{*}$ is related to the sparsity of data set, namely the denser the data set, the greater the value of $n$ (Table.2 and 3, Supplementary).

5 Conclusion

Numerous recommendation methods have been developed to extend the personalized selection. Although some attention has been paid to the long-term recommendation performance, there are still much unknown for the improvement of the long-term recommendation performance. In this paper, according to the assumption that users only keep their eyes on those items ranked higher in recommendation list, we proposed an evolution model to simulate successive recommendations between the recommender system and users, further observing the long-term performance of recommendation algorithms. We found the prediction accuracy of mass diffusion gradually decreased with the network evolution when users taken top-1 recommendation. Thus, the ARL method was proposed to enhance the long-term accuracy of recommender system. The top-1 and top-n item are exchanged to diverse the recommendation list, which make the cold item be introduced into the evolving network. Although some cold items are selected, it is usually considered that the accuracy might be decreased. In fact, cold items selected also enrich the diversity of item in online system, which enable the initial resources of network-based diffusion algorithms to cover more items and enlarge the range of potential recommended item. Many existing papers[18] has also pointed out that one can improve both in accuracy and diversity with a well-designed recommendation algorithm. This is because the higher accuracy is resulted from precisely predicting cold items liked by users. Interestingly, similar to some existing literature, we found both the long-term accuracy and diversity in online system was improved by ARL. Besides, by tuning the parameter $n$ , we showed there was a trade-off between the introduction of item diversity and remaining of user preference to maximize the long-term recommendation accuracy. Meanwhile, the optimal parameter $n^{*}$ fluctuated within a certain range in different period of recommendation, which demonstrates the ARL method is very robust and stable.

The enhancement of long-term recommendation performance can not only satisfy the personalized need of the people but increase the pofit of commercial system. Here, we proposed a novel frame ARL to achieve higher long-term recommendation accuracy, which can be widely applied to those recommendation algorithms that tend to recommend the popular items by tuning the parameter $n$ . Besides, there are a number of interesting extensions that could be done in the future. On one hand, a lot of realistic factors would be considered into the evolution model, such as the change of user preference with time (updating user preference based on user’s online click rate), the extent to which users rely on the recommendation, and the influence of social relationships among the users and so on. On the other hand, we could design new recommendation algorithms combining the short term and long term. For example, the mechanism of user preference recession over time can be introduced into the recommendation algorithm.

Acknowledgements

This work is supported by the National Natural Science Foundation of China [Grant No.61403037, No.61603046], the Natural Science Foundation of Beijing [Grant No.L160008].

Supplementary Material

Enhancing the long-term performance of recommender system

Leyang Xue, Peng Zhang, An Zeng

In this paper, we apply the ARL method on different datasets and compare the long-term recommendation accuracy with original mass diffusion. In the Delicious, Amazon, Stack Overflow, Epinions and Douban date sets, we show the ranking score, Gini index and the optimal parameter $n^{*}$ of ARL under different macro steps as well as the long-term recommendation accuracy of ARL(n). One can see that long-term performance of mass diffusion can be enhanced by ARL. Then, the effectives and robustness of ARL have been confirmed. Besides, we analyze the impact of different datasets on $n^{*}$ .

The analysis of ARL on different datasets

Delicious

In Delicious data set, the accuracy for n=20,100,300 under different macro steps can be observed in Fig. 6 (a). Interestingly, the value of RS of mass diffusion in long-term recommendation sightly lower than single-step recommendation, which suggests that long-term accuracy is improved compared with short time. The result is mainly caused by extreme sparse data set. This is because the coverage of initial resource of mass diffusion is limited in sparse network[39]. By the evolution model and ARL, diverse items are added into evolving network and the link is rewired, which enable resource to cover more items and further could predict items in the probe set. Even though in such a case, the long-term recommendation accuracy still can be enhanced by ARL(20). The same result can be seen in Fig. 6 (c), there are some parameters in long-term recommendation whose accuracy higher than mass diffusion. In the Fig. 6 (d), most optimal parameters of ARL appear at 20, especially for the long-term recommendation (from 400 to 500 macro-steps), which confirms the parameter stability of ARL under different period of recommendation .

Amazon

In Fig. 7, we show the result of ARL conducted on Amazon data set. In Fig. 7 (c), we find that there are some parameters appearing below the straight line whose long-term accuracy higher than original mass diffusion. The result suggests the ARL method is effective to improve the long-term accuracy. According the indication that long-term recommendation accuracy achieves the highest at n = 30, we set the n as 30,50,100 and plot the RS and Gini as a function of different macro-step, which is shown in Fig. 7 (a) and (b) respectively. The enhancement of long-term performance for n=30 is not obvious in Amazon compared with other datasets, especially for the Gini. This is because most users selected less items and a little number of users bought more niche items in sparse dataset[39]. In such condition, the improvement of Gini is limited by breaking-rewiring process of link. From the Fig. 7 (d), one can see the most optimal parameters appear at n = 30 in different macro steps, which reveals ARL(n) is very robust rather than only perform well at specific recommendation period.

Stack Overflow

The data of Stack Overflow is more dense than Delicious and Amazon. In the Fig. 8 (c), the highest accuracy in long-term recommendation appear at n = 150. Compared with the optimal parameter in sparse data, the value of n gradually increase, this is a indication of more diverse item need to be introduced into the evolving network. Besides, the improvement of accuracy and Gini is obvious under different macro steps by ARL(150) observed in Fig. 8 (a)(b). From the Fig. 9 (d), we find the value of $n^{*}$ continually increases from 1 to 50 macro step (As is shown in the inset) and fluctuate around the line of n=150 after 150 macro step. This suggest that we need to put the less diverse items into evolving network in short-term recommendation and more diversity in long-term recommendation.

Epinions

The Epinions data is more dense than previous analysed data. The ARL performs very well in Epinions dataset. As is shown in Fig. 9 (c), the accuracy of all parameters in long-term recommendation higher than mass diffusion. Moreover, the lowest value of the curve appear at n= 470. One can see the the long-term accuracy can be enhanced significantly by ARL(470). The similar result can be seen in Fig. 9 (a)(b). These results show our method is effective for improving the accuracy and diversity under different macro steps. In the Fig. 9 (d), the difference of long-term accuracy between 300 and 500 is rather slight although the $n^{*}$ in different macro steps fluctuate around n=400.

Douban

The result obtained from the Douban data is shown in the Fig. 10. The long-term accuracy and Gini index can be improved greatly by ARL in Fig. 10 (a)(b). In addition, those parameters to enhance long-term accuracy appear in larger interval in Fig. 10 (c). The similar result can be observed in more dense dataset, such as Epinions, Netflix and Movilens (Fig.4 in main text). Meanwhile, the accuracy of most parameters keep the small difference with the optimal parameter in the long-term recommendation. In the Fig. 10 (d), the most optimal parameters under different macro step fluctuates between 300 and 500. Actually, the difference of long-term recommendation accuracy is very small (see in Fig. 10 (c)), which confirms again that ARL method is very robust.

The effect of data set on $n^{*}$

We conducted experiments on different datasets and locate the optimal value of parameter ( $n^{*}$ ) according the current resolution (a step of 10). We use the sparsity to quantify the dataset. Sparsity is calculated by the formula Eq. 5:

[TABLE]

The detailed statistical characteristics is shown in Table 2. Then we analyze the spearman correlation between $n^{*}$ and sparsity (Table 3), finding there is a strong correlation between the optimal parameters and the sparsity of the network. This indicate that the value of optimal parameter $n^{*}$ is relate to sparsity of datasets. This is consistent with our expectation because the coverage of initial resource is limited in sparse dataset. In such condition, we need to introduce similar items into evolving network to keep user’s preference instead of more diverse items. However, diverse items need to be added into the evolving network on the denser dataset, which further distribute the initial resource to cold items.

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. Lü, M. Medo, C. H. Yeung, Y.-C. Zhang, Z.-K. Zhang, T. Zhou, Recommender systems, Physics Reports 519 (1) (2012) 1 – 49. doi:https://doi.org/10.1016/j.physrep.2012.02.006 . · doi ↗
2[2] J. Schafer, J. Konstan, J. Riedl, E-commerce recommendation applications, Data Mining and Knowledge Discovery 5 (1-2) (2001) 115–153.
3[3] A. Zeng, C. H. Yeung, M. Medo, Y.-C. Zhang, Modeling mutual feedback between users and recommender systems, Journal of Statistical Mechanics: Theory and Experiment 2015 (7) (2015) P 07020.
4[4] X. Hu, A. Zeng, M.-S. Shang, Recommendation in evolving online networks, The European Physical Journal B 89 (2) (2016) 46. doi:10.1140/epjb/e 2016-60509-9 . · doi ↗
5[5] G. Linden, B. Smith, J. York, Amazon.com recommendations: item-to-item collaborative filtering, IEEE Internet Computing 7 (1) (2003) 76–80. doi:10.1109/MIC.2003.1167344 . · doi ↗
6[6] D. Pazzani, Michael J.and Billsus, Content-Based Recommendation Systems, Springer Berlin Heidelberg, Berlin, Heidelberg, 2007. doi:10.1007/978-3-540-72079-9_10 . · doi ↗
7[7] M. Balabanović, Y. Shoham, Fab: Content-based, collaborative recommendation, Commun. ACM 40 (3) (1997) 66–72. doi:10.1145/245108.245124 . · doi ↗
8[8] J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon, J. Riedl, Applying collaborative filtering to usenet news, Communications of the ACM 40 (3) (1997) 77–87. doi:10.1145/245108.245126 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Enhancing the long-term performance of recommender system

Abstract

keywords:

1 Introduction

2 Methods

2.1 Evolution model

2.2 Adjustment of Recommendation List

3 Experiments

3.1 Data

3.2 Metric

3.2.1 Ranking Score (RS)

3.2.2 Gini coefficient

3.2.3 Jaccard index

4 Results

4.1 The long-term recommendation performance of ARL

4.2 The long-term recommendation accuracy of hybrid recommendation method

4.3 The optimal parameter of ARL

5 Conclusion

Acknowledgements

Supplementary Material

The analysis of ARL on different datasets

Delicious

Amazon

Stack Overflow

Epinions

Douban

The effect of data set on n∗n^{*}n∗

The effect of data set on $n^{*}$