The temporalized Massey's method
Massimo Franceschet, Enrico Bozzo

TL;DR
This paper introduces a dynamic, temporal adaptation of Massey's rating method for sports teams, updating ratings after each match based on performance and opponent strength, and demonstrates its predictive accuracy using Italian soccer data.
Contribution
It presents a novel temporalized version of Massey's method, integrating time dynamics into team ratings for improved predictive performance.
Findings
The method achieves high foresight prediction accuracy on Italian soccer data.
Temporalized Massey's method outperforms traditional static ratings.
The approach effectively captures team performance dynamics over time.
Abstract
We propose and throughly investigate a temporalized version of the popular Massey's technique for rating actors in sport competitions. The method can be described as a dynamic temporal process in which team ratings are updated at every match according to their performance during the match and the strength of the opponent team. Using the Italian soccer dataset, we empirically show that the method has a good foresight prediction accuracy.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5| Method | Without HFA | With HFA |
|---|---|---|
| Temporalized Massey | 0.611 | 0.702 |
| Elo | 0.611 | 0.695 |
| Official | 0.589 | 0.674 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSports Analytics and Performance · Complex Systems and Time Series Analysis · Data Visualization and Analytics
The temporalized Massey’s method
Enrico Bozzo
Department Mathematics, Computer Science, and Physics
University of Udine
Massimo Franceschet
Department Mathematics, Computer Science, and Physics
University of Udine
Abstract
We propose and throughly investigate a temporalized version of the popular Massey’s technique for rating actors in sport competitions. The method can be described as a dynamic temporal process in which team ratings are updated at every match according to their performance during the match and the strength of the opponent team. Using the Italian soccer dataset, we empirically show that the method has a good foresight prediction accuracy.
1 Introduction
Rating and ranking in sport have a flourishing tradition. Each sport competition has its own official rating, from which a ranking of players and teams can be compiled. The challenge of many sports’ fans and bettors is to beat the official rating method: to develop an alternative rating algorithm that is better than the official one in the task of predicting future results. As a consequence, many sport rating methods have been developed. Amy N. Langville and Carl D. Meyer even wrote a (compelling) book about (general) rating and ranking methods entitled Who’s #1? [11].
In 1997, Kenneth Massey, then an undergraduate, created a method for ranking college football teams. He wrote about this method, which uses the mathematical theory of least squares, as his honors thesis [12]. Informally, at any given time , Massey’s method rates a team according to the following two factors: (a) the difference between points for and points against , or point spread of , up to time , and (b) the ratings of the teams that matched up to time . Hence, highly rated teams have a large point differential and matched strong teams so far. Below in the ranking are teams that did well but had an easy schedule as well as teams that did not so well but had a tough schedule.
In this paper we propose a temporalized version of the original Massey’s method. The idea is the following. For a given team and time , the original Massey rates according to the point spread of up to time and the ratings of the teams that matched up to time . Notice, however, that the rating of a matched team is computed with respect to time , and not, as we argue it should be more reasonable, with respect to the (possibly previous) time when and matched. Suppose, for instance, that and matched at time , when team was strong (high in the ranking), and now, at time , team lost positions in the ranking and is thus weaker. The original Massey’s method adds up to the rating of the current low rating of computed at time , and not the past high rating of computed at time . The temporalized Massey’s method we propose solves this issue. At any given time of the season, the temporalized Massey’s method rates a team according to (a) the point spread of up to time , and (b) the ratings of the teams that matched up to time computed with respect to the time they matched.
The paper is organized as follows. Section 2 reviews the original Massey’s method. We propose the temporalized interpretation of the Massey’s method in Section 3. In Section 3.1 we investigate the algebra of the proposed method while in Section 3.2 we apply it to the last Italian soccer championship. We review related methods for sport rating in Section 4. Finally, we conclude in Section 5.
2 The Massey’s method for sports ranking
In this section we offer a brief introduction to the original Massey’s method. A more general introduction can be found in [9]. The main idea of Massey’s method, as proposed in [12], is enclosed in the following equation:
[TABLE]
where and are the ratings of teams and and is the margin of victory for game of team . If there are teams who played games, we have a linear system:
[TABLE]
where is a matrix such the k-th row of contains all 0s with the exception of a 1 in location and a in location , meaning that team beat team in match (if match ends with a draw, either or location can be assigned , and the other ). Observe that, if denotes the vector of all ’s, then . Let and . Notice that
[TABLE]
and is the signed sum of point spreads of every game played by . Clearly the entries of sum to [math], in fact . The Massey’s method is then defined by the following linear system:
[TABLE]
which corresponds to the least squares solution of system (1).
We observe how the Massey’s team ratings are in fact interdependent. Indeed, Massey’s matrix can be decomposed as
[TABLE]
where is a diagonal matrix with equal to the number of games played by team , and is a matrix with equal to the number of matches played by team against team . Hence, linear system (2) is equivalent to
[TABLE]
or, equivalently
[TABLE]
That is, for any team
[TABLE]
This means, and the same observation can be found in [9], that the rating of team is the sum of two meaningful components:
the mean rating of teams that has matched
[TABLE] 2. 2.
the mean point spread of team
[TABLE]
It is worth pointing out that the ratings computed by Massey’s method correspond to averages. Hence, it could happen that a team that plays with good performances a limited number of matches against strong teams obtains an extremely high and not justified rating. Actually this effect has been clearly discussed in [3]. To overcome this problem the authors propose to introduce a dummy team that defeats all the teams that played a number of matches below a suitable cutoff.
In order to better understand the behaviour of the method, it is interesting to analyse what happens to Massey’s system at the end of the season, assuming a round-robin competition in which all teams matched all other teams exactly once. In this case, the opponents rating component
[TABLE]
where we have used the fact that , and the point spread component
[TABLE]
hence
[TABLE]
and thus
[TABLE]
Hence, the final rating of a team is simply the mean point spread of the team. It is possible to be a bit more precise about this property of Massey’s method by exploiting the properties of the set of eigenvalues, or spectrum, of the Laplacian matrix . The spectrum reflects various aspects of the structure of the graph associated with , in particular those related to connectedness. It is well known that the Laplacian is singular and positive semidefinite (recall that and ) so that its eigenvalues are nonnegative and can be ordered as follows:
[TABLE]
It can be shown that , see for example [1]. The multiplicity of as an eigenvalue of the Laplacian can be shown to be equal to the number of the connected components of the graph, see again [1]. If the graph of the matches is connected or, equivalently, is irreducible, as we assume in the following, is known as algebraic connectivity of the graph and is an indicator of the effort to be employed in order to disconnect the graph.
We can write the spectral decomposition of as where is orthogonal and its first column is equal to , and , , , . From we obtain where , , , . Now
[TABLE]
where is the identity matrix. Observe that the first component of the vector is equal to zero so that
[TABLE]
where . If we denote with the Euclidean norm we obtain
[TABLE]
where we used the fact that the Euclidean norm of an orthogonal matrix is equal to one. Hence, as the algebraic connectivity , as well as the other eigenvalues, approach , that is, as more and more matches are played, the vector approaches and the equality is reached when the graph of the matches becomes complete.
3 Temporalized Massey’s method
We propose a temporalized variant of the original Massey’s method. The main idea of the new proposal is to compute the rating of a matched team with respect to the time when the match was played, and not with respect to the current time, as Massey does.
We consider a temporal process of matches between pairs of teams that occur at a given time. Each element of the process is a tern where and are the teams that matched and is the time of the match. Time is discrete and is represented with natural numbers . We assume that each team plays at most one match at any given time. Matches (of different teams) that occur at the same time are considered to happen simultaneously.
Let be the difference of the points for team and the points against team in the match of time , where we assume if does not play at time . Let be the number of games that team played until time . Let be the teams matched by until time and be the timestamps of these matches. Then the rating of team at time is defined as follows. We set for all teams . Hence all teams are initially equally ranked. For any team , if did not play so far, that is , then its rating is still null. Otherwise, if , we have that, for every :
[TABLE]
This means that the rating of team at time is the sum of two meaningful components:
- •
the mean historical rating of teams that has matched:
[TABLE]
- •
the mean point spread of team at time :
[TABLE]
Notice that we set for all teams, meaning that at the start of the competition all teams are considered equal. This might be not always realistic: we sometimes know that some teams are potentially stronger than others. Hence, an alternative solution is to set , where is the exogenous strength of before the competition starts. For instance, we can set the exogenous strength to be proportional to the rating of the team at the end of the previous season.
We illustrate the proposed method with the following simple example (a complete application is discussed in Section 3.2). The table below shows the results of 6 matches (numbered from 1 to 6), divided in 3 days representing a different time (numbered from 1 to 3), involving 4 fictitious teams (labelled A, B, C, D):
[TABLE]
While there is no doubt that A is the leader of the ranking (it won all matches) and D is the weakest team (it lost all matches), the challenge between B and C is more controversial: each has won one match, lost another match and drew when they matched together.
The following spread matrix contains the cumulative spread of each team at each day. Initially B has a small advantage over C, which is maintained in the second day, and lost in the last day, when they finish with the same spread. Notice that the spread of the last day corresponds, up to a multiplicative constant, to the original Massey rating (see Section 2). Hence, according to the spread or to orignal Massey’s method, there is no difference between B and C at the end of the season.
[TABLE]
However, the temporalized Massey’s method tells us a different story. The following matrix contains the temporalized Massey rating for each day and each team:
[TABLE]
The first day the rating is exactly the spread, hence B has an little advantage over C. Interestingly, this advantage is lost at day 2, while the spread is still in favor of B. The reason is that at day 2, teams B and C matched together and they drew. However, before of the match (at day 1), B was stronger than C, hence C drew against a stronger team with respect to B. Finally, at day 3, B is over C in the ranking (while the spread is equal). In fact, at day 3, B lost, but against the strongest team of the competition (A), and C won, but against the weakest team of the competition (D). In summary, B and C drew the match together (but when B was stronger), and then they both lost against A and won against D. But the subtle difference, which is captured only by the temporalized version of Massey, is that B lost against A at day 3, when A was the strongest team, while C lost against A at day 1, when A was as strong as all other teams. Similarly, B won against D at day 1, when D was as strong as all other teams, while C won against D at day 3, when D was the weakest team. This determines the difference in the final ranking of the temporalized Massey’s method.
3.1 A closer look to temporalized Massey’s method
Let us consider more closely the temporalized Massey’s equation (5). Clearly, if at time team does not play then . On the contrary, suppose that at time team matches with team (in other words for some ). Then the rating of at time can be defined in terms of the ratings at of teams and as well as the point spread of team at the current time :
[TABLE]
Similarly, the rating of at time is:
[TABLE]
Notice that losing against a strong team can still make the day for the loser, but winning against a weak team can result is a drop of the rating of the winner. We can rewrite Equation 6 as follows:
[TABLE]
where and . Notice that . Hence, the rating of team at time is a convex combination of the ratings at time of teams and of the matched team plus a fraction of the spread of at time . Of course, by expanding recurrence (8) one obtains back equation (5).
We would like to attract the attention of the reader to the fact that coefficients and vary in time. More precisely, as the number of games of team grows, the component approaches and vanishes to [math]. This means that, if played few matches and hence is small, then the latest performance of can make a significant difference in the ranking position of team . On the other hand, as grows, new results can only slightly move the ranking position of the team. This is coherent with the general idea that an established reputation is difficult to shake.
Interestingly, if teams and played the same number of matches at time , that is , it is easy to realize that, after a match between and , we have that . This means that what one team gains is lost by the other, and the cumulative rating of the system is the same before and after the match. In particular, in a round-robin competition in which at each day in the competition each team matches another team not matched before, it happens that, if initially all teams have rating equal to 0, at any day the cumulative rating of all teams in the competition is 0. It is worth noticing that this property holds also for the original Massey’s method but is lost if teams play a different number of games.
From (6) it follows that every rating is a linear combination of spreads whose nonnegative coefficients can be placed in a matrix such that
[TABLE]
From (6) it is possible to obtain an equivalent relation for these matrices in the case where matches with at time
[TABLE]
where if and otherwise. Clearly only the first columns of contain entries different from zero.
As an example let us consider again the fictitious teams , , and of the previous example that now is convenient to denote with the integers from to . In this simple example every team plays at each time hence . Therefore Equation (9) becomes
[TABLE]
and this yields
[TABLE]
where only the nontrivial columns of the matrices are shown. Of course if the teams are involved in a round robin competition then in the 4th day and match together again and
[TABLE]
where, as before, only the nontrivial columns of the matrix are shown. It is possible to verify that for are just row permutations of .
Notice that the sum of the coefficients in the columns of the matrices in our example has a quite regular behaviour. Let us denote with the -th column of . By using (10), for we obtain
[TABLE]
that is true in particular for . Making use of induction we obtain for
[TABLE]
As a consequence, the sum of the entries of is equal to for each team . The number is known as the -th harmonic number. It holds that
[TABLE]
It is well known that where is known as Euler-Mascheroni constant. This implies that the range of the ratings of temporalized Massey’s method increase very slowly in . For example . Moreover, the above inequality tells us that ratings and spreads, which are added up in the temporalized Massey’s equation, are of the same order of magnitude.
It is worth noticing that the temporalized Massey’s rating of team at time is a linear combination of past spreads (performances) of all teams, not just of team , with multiplicative coefficients described by matrix . This contrasts with the original Massey’s rating for team . Indeed, as shown in Section 2, as time goes on, the original Massey’s rating for approaches a linear combination of past performances of , without considering the performances of other teams.
It is interesting to observe that, if the teams have exogenous initial strengths, then the linear combination of spreads has to be complemented with a linear combination of them. For example, in order to compute , one has to add to the combination of spreads whose coefficient appear in , the value obtained from
[TABLE]
since the first match of is against and the first match of is against .
Finally, it is useful to compare recurrence (8) with its constant coefficient equivalent, namely:
[TABLE]
where now are constant with , and again is the timestamp of the match of with . By expanding this recurrence we obtain
[TABLE]
where is the number of games that team played until time , while are the teams matched by until time , and are the timestamps of these matches. Comparing Equations 5 and 13, we capture the difference between the varying and constant coefficient recurrences. In Equations 5, past performances of a team are treated homogeneously, while with Equations 13 the past is progressively forgotten, giving more importance to recent performances, and this forgetfulness is quicker if is small (close to 0).
To obtain an alternative intuition of this difference we study the matrices for our simple round robin example. It is not difficult to obtain
[TABLE]
where only the nontrivial columns of the matrices are shown. In addition
[TABLE]
where again only the nontrivial columns are shown. Notice that, not taking into account the factor , the entries of each column of these matrices sum up to a power of the binomial . Since we assumed , we have that, for ,
[TABLE]
This result highlights the difference between the varying-coefficient and the constant-coefficient techniques: the latter gives progressively more and more importance to the recent matches with respect to the former.
Again, if exogenous initial strengths are present then the linear combination of spreads has to be complemented with a combination of initial strengths. For example in order to compute to the combination of spreads one has to add
[TABLE]
3.2 Application to Italian soccer league
As a more realistic example, we analyse the Italian Serie A soccer league of season 2015-2016. It is a round-robin competition with 20 teams and 38 days (each pair of teams matches twice).
In Figure 1 we depict the Kendall correlation between pairs of ranking methods among temporalized Massey (T-M), original Massey (M), and official ranking (O). As days pass, we accrue more and more information about the real strength of teams, and all correlations increase. In particular at day 38, end of the season, we have complete information, and correlations coefficients are close to 1 (0.98 for T-M vs M, 0.93 for M vs O, and 0.91 for T-M vs O), although there are differences in the rankings, in particular when the official compilation is involved. Nevertheless, during the season, when information is partial, the corresponding rankings diverge significantly, and correlation coefficients are far from 1, in particular with respect to the official ranking. For instance the coefficients at day 10 are: 0.80 for T-M vs M, 0.73 for M vs O, and 0.62 for T-M vs O. Moreover, over all days, the association between Massey and official rankings is higher than the association between temporalized Massey and official rankings.
A rigorous test for a rating system is foresight prediction accuracy [11]: how well the vector of ratings computed at day can predict the winners at day ? More precisely, the foresight prediction accuracy of a method is the number of victories that the method corrected foresaw divided by the total number of victories of that competition (we ruled out the ties). Hence, accuracy of 0 means no predictions were correct, while accuracy of 1 means that all predictions were correct. We also computed accuracy introducing a home-field advantage, which was empirically determined for each method and added to the rating of the team playing at home. A home-field advantage matters for foresight prediction in time-varying methods: since initially all teams are rated equal, then in the beginning, before there is enough competition to significantly distinguish the teams’ ratings, home-field consideration is the only criterion that the method can use to draw a distinction between two teams. We compared three time-varying rating methods with and without home-field advantage (see Table 1): official rating of the Italian soccer league, temporalized Massey’s method, and Elo’s method (see Section 4 for a review of this method). Temporalized Massey is slightly more predictive than Elo and significantly better than the official rating. Moreover, for all methods, introducing the home-field advantage has a significant impact in the prediction accuracy. We also computed, for the temporalized Massey’s method, the foresight prediction accuracies at each day of the competition (with home-field advantage). The histogram of accuracies is depicted in Figure 2. Only 2 predictions are below the threshold of 50% of accuracy corresponding to randomness (notice that the 3 predictions in the 40%-50% histogram bar are in fact equal to 50%). On the other hand, most of predictions (78%) are above 60% of accuracy, with 12 predictions (32%) above 80% of accuracy and 3 predictions (8%) with 100% of accuracy.
Related to prediction accuracy, consider the following story. Teams Inter and Juventus had a peculiar season in 2015-2016. Inter immediately won the first matches, but with low spread of points. On the other hand, the start of Juventus was disastrous. This led Inter well above Juventus in the official ranking, with a maximum distance of 10 points at days 5 and 6. From day 10, however, Juventus started an incredible row of wins, culminating at day 19 when the two teams were pair in official standings. Finally, at day 38, Juventus powerfully won the championship with 24 points above Inter. In Figure 3 we depict the temporal dynamics of the official, original Massey, and temporalized Massey rankings during the first round of the championship. The superiority of Juventus with respect to Inter is not witnessed by the official ranking until the end of the round. On the other hand, Massey and in particular its temporalized version predicted this supremacy well before the end of the round.
4 Related literature
An recent account of dynamic modelling of sports tournaments can be found in [2]. In the paper, only the outcomes (win-draw-loss) of the matches, and not point spreads, are considered. The abilities of the home and visiting teams are assumed to evolve separately in time following an exponentially weighted moving average process ruled by a constant coefficients linear recurrence. In our approach the two abilities are twisted together and the evolution is described by a variable coefficients recurrence.
A good survey of dynamic models for teams strengths in NFL can be found in [9]. Generally teams’ abilities are assumed to evolve through a first order autoregressive process. For example in [10] this strategy is used to model season to season changes of team’ abilities while in [8] week to week changes. As we explained in Section 3, due to the variability of the coefficients of recurrence (8) our approach gives, as season proceeds, a greater importance to the history of the results compared with the one given by an autoregressive model.
In [4] the authors propose nonuniform weighting for sports rankings. Their technique allows to weight differently late season play but also, for example, home court advantage or high-pressure games. Actually, their target application is using the matches of the Division I NCAA in order to produce brackets for the famous NCAA Men’s Division I Basketball Tournament, also known as March Madness. For Massey’s method this idea is implemented placing the weights in a diagonal matrix and by solving, instead of system (1), the system . Notice that this is equivalent to the substitution of the two means present in (4) with two weighted means, whose weights are the diagonal entries of . The authors discuss and experiment various strategies for choosing the weights: in the simplest one the weights linearly increase from the first day of the season to the last day.
The authors also apply their weighting technique to another popular ranking method, namely Colley’s method [5]. It is important to remark that the temporalization technique that we developed for Massey’s method can easily be extended to Colley’s method. The Equation (3) in [4] is at the heart of Colley’s method and can be rewritten with our notations as follows
[TABLE]
where and , with , are respectively the number of wins and of losses of team . Our temporalized variant of Colley’s method is ruled by the following equation
[TABLE]
where now and , with , are respectively the number of wins and of losses of team up and including time .
A popular time-varying rating system used is sport competitions is Elo’s method [6, 11]. It was coined by the physics professor and excellent chess player Arpad Elo. Let be the score of team against team ; for instance, in chess a win is given a score of 1 and a draw a score of 1/2 (and a defeat a score of 0). Let be the number of points that team is expected to score against team ; this is typically computed as a logistic function of the difference of ratings between the players, for instance,
[TABLE]
where and is a constant (in the chess world ). Then, when teams and match, the new rank of team is updated as follows (and similarly for ):
[TABLE]
where is a constant (for instance, in chess for new players). Hence, beating a stronger player has a larger reward than beating a weaker one. Notice the intriguing similarity of Elo’s update equation with Equation (8) defining temporalized Massey’s method. Both methods update the old rating of a team in terms of the same ingredients: the current performance of the team and the rating of the opponent team. However, the two methods mix these ingredients in different ways, and hence the resulting recipe differs. While Elo uses a logistic (exponential) function to mix performance and opponent rating, Massey linearly combines the two. Moreover, the combination parameters and in Elo are constant, while the combination parameters and of temporalized Massey vary with the team and in time.
5 Conclusion
We introduced a temporalized version of the popular Massey’s method for rating actors in sport competitions. The idea of the new method is quite simple: to rate the matched team with respect to the time when the match was played. We showed that the resulting method can be described as a dynamic temporal process in which the rating of any team is modified when matches some other team and the update of the rating is a function of the performance of during the match with and of the rating of before the match. We applied the new method to the Italian soccer league showing a good foresight prediction accuracy.
In fact, the idea of temporalizing the Massey’s method we have proposed in this context can be be generalized to any recursive centrality measures on networks. Consider for instance Pagerank centrality [7], which claims that a node is important if it is linked to by other important nodes. For instance, a scholar is relevant if it is cited by relevant scholars, or a Web page is important if it is hyperlinked to by other important Web pages. The original definition of the Pagerank method ignores the time of creation of the link between nodes. However, we argue that it is different if we, as scholars, receive an endorsement from a young and almost unknown author, or from the same author when she won the Turing award. Similarly, there is a difference in receiving a link from a peripheral Web page or from the same page when it became a central hub. We look forward to a temporalized version of Pagerank with an application to sport competitions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. E. Brouwer and W. H. Haemers. Spectra of graphs . Universitext. Springer, New York, 2012.
- 2[2] M. Cattelan, C. Varin, and D. Firth. Dynamic Bradley-Terry modelling of sports tournaments. Journal of the Royal Statistical Society Series C Applied Statistics , 62(1):135–150, 2013.
- 3[3] T. P. Chartier, J. Harris, K. R. Hutson, A. N. Langville, D. Martin, and C. D. Wessel. Reducing the effects on unequal number of games on rankings. IMAGE The bullettin of the International Linear Algebra Society , 52:15–23, 2014.
- 4[4] T.P. Chartier, E. Kreutzer, A.N. Langville, and K.E. Pedings. Sports ranking with nonuniform weighting. Journal of Quantitative Analysis in Sports , 7(3), 2011.
- 5[5] W. N. Colley. Colley’s bias free college football ranking method: The Colley matrix explained. Available at http://www.colleyrankings.com/matrate.pdf , 2002.
- 6[6] A. E. Elo. The Rating of Chess Players, Past and Present. Arco, New York, 1978.
- 7[7] M. Franceschet. Page Rank: Standing on the shoulders of giants. Communications of the ACM , 54(6):92–101, 2011.
- 8[8] M. E. Glickman and H. S. Stern. A state-space model for national football league scores. Journal of the American Statistical Association , 93(441):25–35, 1998.
